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In this article we review the framework for spontaneous replica symmetry breaking. Subsequently 
that is applied to the example of the statistical mechanical description of the storage properties of 
a McCulloch-Pitts neuron, i. e., simple perceptron. It is shown that in the neuron problem the 
general formula appears that is at the core of all problems admitting Parisi's replica symmetry 
breaking ansatz with a one-component order parameter. The details of Parisi's method are reviewed 
extensively, with regard to the wide range of systems where the method may be applied. Parisi's 
partial differential equation and related differential equations are discussed, and a Green function 
technique introduced for the calculation of replica averages, the key to determining the averages 
of physical quantities. The Green functions of the Fokker-Planck equation due to Sompolinsky 
turns out to play the role of the statistical mechanical Green function in the graph rules for replica 
correlators. The subsequently obtained graph rules involve only tree graphs, as appropriate for 
a mean-field-like model. The lowest order Ward-Takahashi identity is recovered analytically and 
shown to lead to the Goldstone modes in continuous replica symmetry breaking phases. The need 
for a replica symmetry breaking theory in the storage problem of the neuron has arisen due to the 
thermodynamical instability of formerly given solutions. Variational forms for the neuron's free 
energy are derived in terms of the order parameter function x(q), for different prior distribution of 
synapses. Analytically in the high temperature limit and numerically in generic cases various phases 
are identified, among them one similar to the Parisi phase in long range interaction spin glasses. 
Extensive quantities like the error per pattern change slightly with respect to the known unstable 
solutions, but there is a significant difference in the distribution of non-extensive quantities like the 
synaptic overlaps and the pattern storage stability parameter. A simulation result is also reviewed 
and compared to the prediction of the theory. 
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I. INTRODUCTION AND OVERVIEW 



A. Introduction 



In the past one- and- a- half decade, statistical physical methods yielded a rich harvest in theoretical and practical 
results in the exploration of artificial neural network models. In contrast to more traditional mathematical approaches, 
such as combinatorics, statistical data analysis, graph theory, or mathematical learning theory, the main emphasis in 
statistical physics lies on interconnected model neurons, considered as a physical many-body problem, in the limit 
of large number of variables. The latter property renders the problems similar to statistical mechanical systems in 
the thermodynamical limit, that is, when the number of particles is very large. This does not necessarily mean 
large number of units in a neural network, the thermodynamic limit applies also in the case of a single neuron if the 
number of adjustable variables, the analog of synaptic strengths of biological neurons, is sufficiently large. A much 
studied type of network is constructed from the McCulloch-Pitts model neuron 0, called also single-layer, or simple, 
perceptron if it is operating alone as a single unit In this paper we will examine the single model neuron's ability 
to store, i.e., to memorize, patterns, crucial for the understanding of networked systems. 

The paper is strictly about the artificial model neuron, and does not imply biological relevance. However, the 
notions neuron and synapses, the latter designating coupling strength parameters, are biologically inspired, and will 
use them throughout. 

We shall apply the statistical mechanical framework introduced by Gardner and Derrida f|-|] in 1988-89, which 
gave birth to a subficld of the theory of neural networks. Since then, the McCulloch-Pitts neuron has become well 
understood below the storage capacity, where patters, or, examples, can be perfectly stored. The region beyond it, 
however, remained the subject of continuous research [|6|-|l2|]. If the number of patterns exceeds the capacity then 
there is no way of storing all of them. One possible approach beyond capacity is to choose a quantity to be optimized. 
Examples for such a quantity are the stability of the patters - in other words, their resistance to errors during retrieval 
-, or, the number of correctly stored patterns irrespective of their stability. Such problems can be formulated by means 
of a cost, or, energy function, giving rise to a statistical mechanical system. In the case of minimization of the number 
of incorrectly stored patterns, difficulties have arisen on every front where the problem was attacked. On the one 
hand, the analytical method inherited from spin glass research is no longer applicable in its simplest form, that is, 
the so-called replica symmetric (RS) ansatz breaks down. On the other hand, near and beyond capacity numerical 
algorithms begin to require excessive computational power. The physical picture behind that is the roughening of the 
landscape of the cost function the algorithms try to minimize. 

Phases of similar complexity, wherein the optimum-finding algorithm, the analog of the dynamics in the statistical 
mechanical system, slows down to the extent that can be considered as breakdown of ergodicity, were observed in 
combinatorial optimization problems and still keep eluding analysis [p^|-p^|. Several empirically hard optimization 
problems |l3f , including minimization of error beyond storage capacity for the McCulloch-Pitts neuron [jl6|] , are known 
to belong to the so-called non-deterministic polynomial (NP) complete class. It is of significance, if by means of sta- 
tistical physical methods some properties of the energy, or, free energy, landscape of NP-complete problems can be 
clarified. The statistical physical equivalent of a few NP-complete systems were shown, in averaged thermal equilib- 
rium, to exhibit spin-glass- like behavior |14|. That gives rise to the belief that there may be a a general connection 
between NP-completeness and spin glass behavior. Thus the identification and description of such thermodynamic 
phases may be instructive from the algorithmic viewpoint as well. It should be emphasized that NP-complete opti- 
mization problems are of diverse origin and many of their quantitative properties show little resemblance. Accordingly, 
those reformulated as statistical mechanical systems exhibit different thermodynamic behavior, e. g., in averaged equi- 
librium have different phase diagrams. Nevertheless, by the notion of glassy phases statistical physics may provide us 
with a common concept for understanding at least some ingredients of NP-completeness. 

It is the region beyond capacity of a single McCulloch-Pitts neuron that we claim to uncover in the present paper, 
within the averaged statistical mechanical description of thermal equilibrium. While the theoretical framework is in 
some respects different from, rather a generalization of, the techniques applied to the Ising spin glass, we can now 
reinforce the so far vague expectation about the appearance of a spin glass phase and deliver quantitative results. 
Networks beyond saturation are long known to have complex features, here we demonstrate that even a single neuron 
can exhibit extreme complexity. 

The present article grew out of the work with P. Reimann, presented in a letter |17fl. A more extended article, still 
in many respects a summary of the main results, has been accepted for publication JL8|. The emphasis in the present 
paper is twofold. On the one hand, we give a comprehensive review of the technical details of the replica symmetry 
breaking theory, including the so-called continuous replica symmetry breaking. In the core is Parisi's original theory, 
which is here technically generalized to incorporate also the neuron problem. Furthermore, several extensions of the 
theory are introduced here that are applicable also to spin glasses. Along the mathematical parts an educative and 
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self-contained line of reasoning is favored over a terse style. By that we would like to fill a hiatus in the literature 
on the theory of disordered systems in that we present the technical details to those wishing to understand Parisi's 
method and possibly to use it to other problems. On the other hand, we apply the theory to the storage problem 
of a single neuron. Since the first statistical mechanical approach to this question, several other neural functions 
have been treated by statistical mechanical methods, and some of those may be more important for applications than 
pattern storage. However, even storage represents a strong theoretical challenge. Beside the Little-Hopficld model of 
auto-associative memory jl9| , it can be considered as a point of entry of the statistical mechanical approach into hard 
problems in the field of artificial neural networks, and may open the way for further applications. 

On the technical side, the paper is centered about Parisi's method, successful in solving the mean equilibrium 
properties of the infinite range interaction Ising spin glass, the Sherrington-Kirkpatrick model |l4f| . It turns out that 
after some generalization ]l7| , |l8[ of the original method p0|-p3|, this becomes adaptable to the statistical mechanical 
formulation by Gardner and Derrida |^,^| of the neuron problem. The single neuron with a general cost function, 
i. e., error measure, was introduced by Griniasty and Gutfreund Q and called by them potential. We show that this 
model will give rise to the most general term that admits Parisi's solution with one order parameter function. Under 
Parisi solution we understand for now the hierarchical structure of the order parameter matrix that gives rise to the 
nonlinear partial differential equation introduced by Parisi in an auxiliary role, allowing continuous replica symmetry 
breaking. 

We would like to point out that all systems studied by means of the Parisi ansatz with one order parameter 
matrix, like the multi-spin interaction Ising [^ij and the Potts glass near criticality j25) , contained as special cases the 
aforementioned general term. Therefore, our results about the Parisi solution of the neuron, go well beyond the scope 
of neural computation. Here we call the reader's attention to the fact that Parisi's method has been applied to the 
study of metastable states in the Sherrington-Kirkpatrick model p6| , where in fact three order parameter matrices 
emerged. That work indicates how the continuous replica symmetry breaking solution is to be obtained there and 
implicitly suggests the generalization to vector order parameters as we outline in this paper. 

Beyond giving a comprehensive account of Parisi's framework, we shall perform a concrete field theoretical study, 
including the calculation of averages, graph rules involving Green functions for the evaluation of correlation functions, 
analytic derivation of a Ward-Takahashi identity, and integral expressions for the generalized susceptibilities necessary 
to determine thermodynamic stability of the solution. The insightful works about the second and higher order 
correlations of the magnetization in the continuous replica symmetry breaking phase of the Sherrington-Kirkpatrick 
model present concrete examples for field expectation values |2~7j , |28f| . These are generalized by our formulation in this 
article, new even in the context of spin glass problems. With the notable exception of the Sherrington-Kirkpatrick 
model and the formally analogous Little-Hopfield system, where the low temperature phase was also extensively 
described |27-3lJ], most studies of long range interaction disordered systems concerned the region near criticality. The 
framework we present here is naturally designed for application deeply within the glassy phase. 

The differences between the Sherrington-Kirkpatrick and neuron models are obvious at first sight. The former 
is an Ising-type system, with a multiplicative two-spin interaction. In contrast, our main focus here is a spherical 
model-neuron, i. e., whose microstates are characterized by the synaptic couplings, continuous and arbitrary up to 
an overall normalization factor. The interaction between synapses is mediated by the error measure potential of 
Griniasty- Gutfreund |(|, a function arbitrary to a large extent. In this light one may find the close analogy between 
disordered spin systems and the neuron model somewhat surprising. The similarity becomes, however, apparent when 
the statistical mechanical system is reduced to a variational problem in terms of a single order parameter function. 
Such have been available for the Sherrington-Kirkpatrick model, whereas we have constructed one for the single 
neuron. The variational framework is brief, it allows a quick derivation of the stationarity relations, gives account of 
thermodynamic stability in a subspace called longitudinal, and is of help in numerical computations. The differences 
between the Sherrington-Kirkpatrick and the neuron problems may be small in the variational free energy formula, 
but are still the cause of technical complications for the neuron problem. The physical reasons are that, firstly, 
the neuron does not possess the spin flip symmetry of the spin glass without external field, secondly, the neuron's 
error measure potential is more complicated than the multiplicative spin exchange energy term. Thus a few special 



properties of the Sherrington-Kirkpatrick model that allowed for some analytic results and simplified numerics [£7 29 
are absent in the neuron. Gcncrically similar complications may arise in other spin glass variants, so the much studied 
Sherrington-Kirkpatrick model is to be considered as a rather special, simple case. 

It is worth mentioning briefly two important areas among the many we do not treat in this paper. First and 
foremost, we do not discuss here the dynamical evolution of disordered systems. Since the ground-breaking early 
works on the dynamics of the Sherrington-Kirkpatrick model by Sompolinsky and Zippelius [|32|~|35|, and the path- 
integral formulation for Ising spins by Sommers ]36| ] , many aspects of the dynamics of disordered systems have been 
clarified. They proved essential for the understanding also of numerical algorithms. However, one has to reckon that 
even averaged, stable equilibrium properties of complex phases of disordered systems are still far from clarified. The 
existence of many metastable states, the signature of glassy systems, and the ensuing complex nature of dynamical 
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evolution, often termed as breakdown of ergodicity, puts in doubt even the existence of thermal equilibrium. On 
several model systems, however, extensive numerical simulations have demonstrated that equilibrium properties, 
averaged over the quenched disorder, can carry physical meaning. These properties are the subject of the present 
article. Secondly, from the viewpoint of mathematical rigor, the replica method raises many a question that we leave 
unanswered. In fact, quite a few scientists view this method with suspicion, partly because the limit of "zero number 
of replicated systems" may seem to violate physical intuition. However, the large number of simulations confirming 
replica symmetric solutions, and the fewer ones supporting replica symmetry breaking, as well as the absence of 
numerical results outright disproving the theory to this date, should provide ground for confidence. Theoretical 
physics often employs methods of seemingly shaky mathematical foundations, whose confirmation may come from 
comparison with real or numerical experiments. Such a confirmation may then trigger rigorous clarification. 



B. Overview 



Here we give a review of what subsequent sections are about. Section || introduces some fundamental concepts and 
gives a brief historical review on neural modeling and, to a very basic extent, on the Sherrington-Kirkpatrick model of 



spin glasses. In Section III the single McCulloch-Pitts neuron is described as a statistical mechanical system following 
Gardner, Derrida ^,|| and Griniasty, Gutfreund ||. Pattern storage is interpreted as an optimization problem in 
the space of synaptic coupling strengths, and the ensuing thermodynamic picture is outlined. The replica free energy 
for various prior distributions of synapses is derived, such as the spherical constraint as well as arbitrary distribution 
of independent synapses. Highlighted is the central role of the neural local stability parameter, whose distribution 
gives through a simple formula the average error. Most of this section recites known concepts, with a few new details. 
Sections |fv| |v|, and VI are devoted to the Parisi solution. We start out from the "hard" term in the replica free 
energy of the neuron, that can be considered as a generalization of free energy terms emerging from the classic long 



range interaction, disordered, spin problems. In IV a comprehensive presentation of the Parisi solution is given, 
including the derivation of Parisi's partial differential equation. It is demonstrated that this equation incorporates all 
finite replica symmetry breaking ansatze, besides continuous replica symmetry breaking. Parisi's partial differential 
equation gives rise to a collection of related partial differential equations, they are reviewed here, and several useful 
Green functions are presented, among them prominently the Green function for Parisi's partial differential equation. 
Section [v] contains new results such as analytic expressions for expectation values and correlation functions of replica 
variables. The eigenvalues of the Hessian of the replica free energy are discussed, determining thermodynamic stability. 
The Green function of Parisi's partial differential equation turns out to be the field theoretical Green function that 
correlators are composed of, and allows the introduction of a graph technique. Section [v| discusses a few aspects of 
the Parisi solution and two particular cases where Parisi's partial differential equation can be explicitly solved. We 



return to the special problem of the model-neuron in Sections VII and VIII , and apply the rather abstract results of 
the preceding sections to it. The case of continuous synapses with the spherical constraint, including the conditions 
of stationarity and thermodynamic stability, is analyzed in detail in Section |VII| . In the limit of high temperature 
and large number of patterns the formalism becomes easily manageable, while exhibiting a nontrivial phase diagram 
with three different glassy states. This section contains our variational approach, the main result being a variational 
free energy whence thermodynamic properties can be straightforwardly derived and numerically explored. By means 
of the various partial differential equations several relations about the stationary state are uncovered. The scaling 
required, when the temperature goes to zero, is also described. The variational free energy is numerically evaluated 
for several characteristic parameter settings, together with the order parameter function and the probability density of 
local stabilities. Previous simulation data |57| were improved upon in Ref. Jl8[ , whence we redisplay the comparison 
of simulation results with the theoretical prediction. The case of arbitrarily distributed independent synapses is 



considered and the corresponding variational framework presented in Section VIII. Often used abbreviations are 
listed in Appendix [X| Further appendixes contain more technical details. Appendix gives the derivation of the 
replica free energy for synapses with spherical as well as with independent but otherwise arbitrary normalization. 
Appendix ^| bridges a gap in the calculation of Section [V. In Appendix [d] the short way of deriving Parisi's partial 



differential equation is given, which requires the continuity of the order parameter function. Note that the this 



equation is valid even in the case of discontinuities, but then the derivation, as shown in Section IV A 2, is more 
involved. We do not pursue in the paper the case of vector order parameters, but give a brief account of how Parisi's 
partial differential equation for a vector field emerges in Appendix |e| A technically useful identity between Green 
functions is derived in [f| and the high temperature limit of some relevant partial differential equations are presented 
m The only case where we can show longitudinal stability far from criticality is analyzed in Appendix [fl|. 
As also stated in the Acknowledgment, sections with special contributions by P. Reimann are marked by a *. 
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II. ARTIFICIAL NEURAL NETWORKS AND SPIN GLASSES* 



The purpose of this section is to put the often technical analysis of later parts of the paper in the wider context 
of neural networks and spin glasses. The central issues of this work are the intricate details of Parisi's continuous 
replica symmetry breaking (CRSB) scheme, furthermore, the adaptation of the method to the equilibrium storage 
properties of a McCulloch-Pitts neuron, or, simple perceptron. We have made an attempt to cover the most relevant 
literature on these two narrower themes. On the other hand, we also mention other subjects like learning algorithms, 
generalization and unsupervised learning, layered perceptrons, and spin glass models, where our selection of references 
is far from complete, and not necessarily even representative. 



A. The McCulloch-Pitts neuron and perceptrons 

The model of a neuron as put forth by McCulloch and Pitts in a ground-breaking paper JlJ in 1943 has attracted 
since continuous interest Jl9[| . While inspired by real neurons in the brain, it is oversimplified from the biological 
viewpoint. The model neuron can assume two states, one "firing", the other "quiescent". The state depends on input 
signals, obtained possibly from other such units, and on the coupling parameters that weight the inputs. The couplings 
are often termed "synaptic" in reference to the synapses, the connection points of biological neurons. Mathematically 
speaking, the model neuron computes the projection of an TV-dimensional input S along a vector J of synaptic 
couplings and outputs £ = 1 (say it fires) or £ = — 1 (it is quiescent) according to the sign of this product J ■ S as 

£ = sign ^J fe S^ . (2.1) 

The argument of the sign can be extended by a constant threshold, which alternatively may be represented by J = 1 
if one only allows Si = 1 as input. Remarkably, as already McCulloch and Pitts noticed, a sufficiently large collection 
of such model neurons, when properly connected and the couplings properly set, can represent an arbitrary Boolean 
function. The model can be naturally extended to continuous outputs, when the sign function is replaced by a 
continuous transfer function, generally of sigmoid shape [|l9| . 

The next major step forward was achieved with the introduction of the perceptron concept by Rosenblatt The 
idea is to place a number of McCulloch-Pitts neurons into different layers, with the output of neurons in one layer 
serving as input for those in the next layer, hence its name multi-layer feedforward network. As it was intended to 
model vision, such a network is also called multi-layer perceptron. The input to the network as a whole goes into the 
first layer, while the final output is that of the last layer. A widely applied learning concept is to try to determine 
appropriate synaptic couplings J for all the neurons so as to satisfy a prescribed set of input-output data, called 
training examples. In other words, the aim is to store the training examples. One of the motivations for doing so is 
that a possibly existing systematics behind the training examples may be approximately reproduced also on previously 
unseen inputs, that is, the network will be able to generalize. The special case of a single McCulloch-Pitts unit is 
a single-layer perceptron, called also simple perceptron by R osen blatt, and lately sometimes just perceptron. For 



the simple perceptron with binary outputs, as defined in Eq. (2.1), he proposed an explicit learning algorithm that 
provably converges towards a vector of synaptic couplings J, which correctly stores the training examples, provided 
such a J exists. Simultaneously with Widrow and Hoff Pq , ^9| , he also studied two-layer perceptrons with an adaptive 
second layer, while using the first layer as preprocessor with fixed (non-adaptive) synaptic couplings, however, without 
being able to generalize his learning algorithm to this case. 



The field was driven into a crisis by the observation of Minsky and Papert [J40| that the simple perceptron (2.1) 
is unable to realize certain elementary logical tasks. Confidence returned when the so-called error back-propagation 
learning algorithm began to gain wide acceptance (see (4^] and further references in Jlj|). This algorithm performs 
training by examples of fully adaptive multi-layer feedforward networks with generically differentiable transfer func- 
tion. Such networks, if chosen sufficiently large, are known to be capable to realize arbitrary smooth input-output 
relations see, e. g., [i^Jiq] . Though this algorithm and its various descendants converge often quite slowly and in 
principle one cannot exclude that they get stuck before reaching a desired state, they have been successful in a great 
variety of practical applications pl| . 



B. Associative memory 

Besides the layered feedforward perceptron architectures, a second eminent problem in neural computation is the 
so-called associative memory network or attractor network. We limit our discussion to the auto-associative case, 
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i. e., the memory network is addressable by its own content. The concept can be traced back to Refs. and 
rediscovered later (see Jl^] for further references). The recurrent (in contrast to feedforward) network of McCulloch- 
Pitts model neurons, originally suggested by ]46^{48|] was especially suited for the task. A recurrent network contains 
interconnected units where signals pass through directed links that can form loops. Here the desired patterns to be 
stored correspond to collective states of the units in the network, and the idea is to define a discrete-time dynamics 
of th e states so that the prescribed patters (examples) are fixed point attractors. For a collection of N model neurons 
( |2.l| ), the outputs £fc, k = 1, N, at a given time step t are taken as new inputs Sk(t) for the next time step. Denoting 
by Jik the synaptic coupling by which the i-th neuron weights the signal Sk{t) stemming from the fc-th neuron, we 
can write the discrete-time dynamics of neurons with binary output as 



N 



Si(t + 1) = sign ^2_, J ik Sk(t)J , (2.2) 

where self-interactions are usually excluded by setting Ja = 0. Taking an input pattern S = S(0) as initial condition, 
we understand in the definition fl2.2Q that the update is done sequentially, either by scanning through the Si-s one 
after the other, i — 1, 2, . . . , N, 1, 2, . . . , or by randomly selecting the sites i one after the other. Such a dynamics is 
supposed to evolve towards the closest attractor (hence the name attractor network). If this attractor is a fixed point, 
that it is if the couplings are symmetric as Jik = Jki, for all i, k, a previously unseen pattern S can be associated 
with one of the stored examples, assumed to be the "most similar" of all stored patterns, hence the name associative 
memory network. Note that for such an associative memory network patterns S have binary components Sk = ±1. 
In case of units with continuous states one only requires that the length \S\ goes like TV 1 / 2 for N — * oo. We mention 
that if the synaptic couplings are non-symmetric, convergence to a fixed point is no longer certain and chaos can arise 

Given the patterns to be stored, the aim is to construct a dynamics with prescribed attractors. This is the reverse 
of and possibly more difficult than the more conventional problem of finding the attractors for a given dynamical 



system. If we accept the neural dynamics like in (2.2), the task is then to set the Jj/. couplings to such values that 



lead to the desired attractors. In his pioneering works fcl 52 Little suggested an approach to this problem by givin; 



an explicit form for the synaptic couplings Jik of the McCulloch-Pitts neurons as inspired by the ideas of Hebb |5J 



about the working of brain cells. Little defined a parallel update rule for (2.2) and included a stochastic cle men t 



characterized by temperature. Hopfield's milestone contribution |54|j55|] consisted in reformulating the dynamics (2.2) 



as a sequential update algor ithm , which led to an optimization problem with an energy function. We will call the 



network with the dynamics (2.2) associative memory, while in the special case, when the synaptic couplings Jik are 
chosen according to the Hebb rule, the name Little-Hopfield model will be used. For a neuro-physiological argument 
for a non-Hebbian learning rule we refer to |5q] . 



The associative memory network (2.2) may be appealing because it models, albeit very crudely in details, a biological 
concept, its use for practical purposes is, however, in doubt [Q. Indeed, the required storage space for the synaptic 
coup lings is comparable to that for directly storing the patters, and the computational effort of the retrieval dynamics 
( |2.2| ) is similar to a direct comparison of a given input pattern with all the stored patterns. Only with appropriate 
modifications of the original setup, e. g., non-uniformly distributed patters, may a digital implementation of the 
network become advantageous 8 . For various such modifications and their possible practical use we refer to fll9|] . 



C. Sherrington-Kirkpatrick model 

Spin glasses are normal metals (e. g. Cu or Au) with dilute magnetic impurities (e. g. Mn or Fe), or, lattices 
of random mixtures of magnetic ions (e. g. Eu^Sri-^O) exhibiting a freezing transition of the spin disorder at low 
temperatures [57| . Due to spatial disorder, the spin interactions can be considered as random. The random sign 
of the interactions can be the cause of one of the basic features of spin glasses, the effect of frustration H], when 
the interaction energies of all spin pairs cannot be minimized simultaneously. In a pioneering paper, Edwards and 
Anderson |39| introduced a simplified model of a spin glass, essentially an Ising system with randomly selected, but 
fixed, exchange couplings. The infinite-range interaction version of that is called Sherrington-Kirkpatrick (SK) model 
fio| , |6l| and is considered a realization of the mean field approximation. The theoretical analysis of the SK-model 
triggered the invention of novel statistical mechanical concepts and methods which subsequently found applications 
in modified spin glass models such as the random energy 163] and p-spin interaction models |]M,EJ,p4| , the Heisenberg 



| p5| and the Potts glass JS6 67 2q], multi-p-spin and quantum spin glass models |6q-|7C|] . Methods inherited from spin 
glass theory also provided insight into many other problems, several of them originating from outside of physics. 
Prominent examples are various models of interfaces in random environment J7l|-j75| , granular media ffq ] , combinato- 
rial optimization (see Jl4[ for an early review and |15j for a new development)7game theory W%, protein and nucleic 
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acid folding J78|-|S2[, and noise reduction in signal processing |B3[. Last but not least, as we will expound it in the 
present paper, methods first introduced for describing the equilibrium properties of the SK model are of paramount 
importance in the statistical mechanical approach to neural networks. We give, therefore, a brief account of the SK 
model, concentrating on basic properties in thermal equilibrium. The general mathematical framework described in 
the main part of this paper covers the SK model as a special case. For pedagogical introductions into the calculation 
techniques we also refer to |Q, to Section 10 in [fL9| and Section 3 in A detailed discussion of the physical content 
of the solution can be found, e. g., in ]6l| , |57| , [r^ , ^4| . We only mention here that the question, whether the solution of 
the SK model provides a qualitatively appropriate description of short range interaction spin glasses, is still debated. 
See Refs. ]85|-|87{ and |8§|-|9"c|] for two exchanges on the subject, and |H) for a review and simulation results. 

The state variables of the SK model are the Ising spins Si — ±1, interacting via random coupling strengths Jik 
(i, k — 1,2, N). In the absence of external magnetic field, the spin Hamiltonian is of the form 

H i (S) = ~Y, J ikSiSk (2.3) 

and the couplings Jik are independently sampled from an unbiased Gaussian distribution with variance 1/N. The 
scaling by N guarantees the extensivity of the energy in the thermodynamic limit N — > oo. The feature that 
the interactions Jik are randomly chosen but then frozen while the spins obey Boltzmannian thermodynamics is 
summarized by our calling the J-s quenched variables. An important goal is then to calculate, in the large N limit, 
the free energy per spin 

f SK = - lim (Np)- 1 In Zj. (2.4) 

N~ >oo 

Here (3 — l/(fc_BT) is the inverse thermal energy unit and Zi is the partition sum ^2 S e~ t3nj ( s ' 1 over all spin configu- 
rations S. The sum over the discrete spin s tate s is often denoted by a trace as Trg. The interactions Jik being 
quenched random variables, the expression (|2.4|) as it stands is analytically intractable. Physically, one expects that 
two different realizations of the random interactions Jik will exhibit the same behavior at thermal equilibrium for 
N — > oo. Mathematically, this means self-averaging of the free energy density fsKi i- e -j for any randomly sampled set 



of the Jik, Eq.(2.4) yields the same result with probability 1, allowing us to rewrite lnZj as an average (lnZj) qu over 
the quenched disorder J. Rigorous mathematical discussions of this property for the SK and the related Little-Hopfield 
model can be found in Refs. [p2|-p4[. 

The direct evaluation of (lnZj) qu is difficult, but it can be considerably simplified by means of the replica method. 
This was independently discovered several times (see discussions in Refs. |3l],^5|]) but well known only since its 
application to the spin glass problem by Edwards and Anderson [ j59| . The first step of this method consists in what 
has become known as the "replica trick" , 

x n — I 

lim = lnx. (2-5) 

n— >0 n 



Thus Eq.(2.4) can be rewritten as 



f SK = lim lim 1 jf" )qU . (2.6) 

The name replica refers to the fact that the n-th power of Z$ is the partition function of n non-interacting, identical 
replicas of the original system. The average (. . . ) qu will create interactions between the replicated systems. As second 
step we interchange the two limits in ( |2 . 6| ) , which has been proved valid for the SK model by van Hemmen and Palmer 
p5[. A further step consists in the assumption that it is sufficient to evaluate (Z™) qu for integer n and then interpret 
n as real variable ("continuation") in order to evaluate the limit n — > 0. In doing so, the point is that the averaged 
partition sum (j ?") a u with integer n-s can be technically tackled, while with non-integer n it is as intractable as the 



(In Zj) qu of Eq. (2.4). The fourth step is the evaluation of (Zj l ) qu by means of a saddle point approximation, becoming 



exact as N — > oo. The detailed calculations along this program are given in 160 61 with the result 



f SK = lira- mm. f SK (Q) (2.7) 
n^O n Q 

f SK (Q) ^-x + lE'd-zr 1 mZ 0Q . (2.8) 







Here the minimization - stemming from the saddle point approximation - runs over all symmetric, n x n matrices Q 
with elements q aa = 1 and —1 < q a b < 1 (a,b = 1,2, ...,n being the replica indices). Furthermore, Z^q is formally 
identical to Zj if one sets N = n and J = /3Q, a specialty of the SK model. The function Jsk{Q) is often referred to 
as replica free energy. 



The practical meaning of (2.7) can be understood as follows. A direct analytical evaluation of the minimum in (2.7) 
for arbitrary integer n is typically not feasible. Therefore, one introduces an ansatz for Q with a set of variational 
parameters A that lead to formulas explicitly c ont aining n, and so continuation of formulas containing the elements 
of Q to real n-values becomes feasible. Then (^7j) is to be understood as first a minimum condition for general n 
by differentiating the replica free energy fsxiQ) with respect to the matrix elements q a b, the so-called stationarity 
condition, and the requirement of at least the absence of negative eigenvalues of the second derivative matrix, the 
Hessian, of /sif (Q), i- e., the condition of local thermodynamic stability. (Here we disregarded the border case 
when the minimum does not satisfy stationarity, and the situation when there may be several locally stable states. 
Interestingly, in the SK model these cases do not occur, but they do in other systems.) These relations cannot be 
continued to n — without further pa ram etrization. But insertion of the ansatz with the variational parameters A 



allows for the limit n — > 0. In this light (2/7) does not prescribe a customary minimization, rather defines the minimum 
condition consisting of the aforementioned stationarity and stability relations, which await parametrization. 

On the other hand, we can reverse the order of parametrizing and minimum search. The parametrization should 
allow us to construct /sk(Q(A)) for any n. The minimization condition for integer n with respect to the variational 
parameters A implies, in the generic case, the vanishing of the derivatives, and is supposed to admit continuation to 
real n-values. Closer inspection shows |l4||3lj that after such a continuation, in the limit n — > 0, the condition of local 
stability described above will no longer correspond to a local minimum of /sk(Q(A)) but rather to a local maximum. 

This can be crudely understood when one realizes that the second term on the r. h. s. of (2.8) contains (") 
independent terms, equal the number of order parameters. The (™) changes sign when n passes from n > 1 to n < 1, 
so for n < 1 one has formally a negative number of order parameters q a b- This does not cause, however, confusion, 
because due to the parametrization of the matrix Q we do not need to work with the elements q a b for n < 1. A similar 



sign change of terms obtained by expanding the third term in (2.8) changes the nature of the extremum of the free 
energy from minimum to maximum. 

The above reasoning thus leads, within a given parametrization, to 

fsK = max lim -jW(Q(A)). (2.9) 

\ n— >0 n 

This formula prescribes global maximization in A. So if several local maxima are found, the fsx values there should be 
compared and the global maximum within the given parametrization is thus well defined. However, we are in principle 



still not allowed to bypass the aforementioned local stability analysis, because a global maximum as in (2.S), within 
a given parametrization, may still be unstable with respect to changes in the q a b matrix elements. Thus one should 
evaluate the spectrum of the Hessian matrix of /sit (Q) and require that no negative eigenvalues exist in the limit 



n — ► 0. This leads to the at first sight contradictory prescriptions, namely, the minimization in (2.7), formulated as 
the absence of negative eigenvalues of the Hessian of /sk(Q), and the maximization of the parametrized free energy in 
( |2.9p . Closer inspection shows, however, that there is no logical contradiction. Indeed, maximization in the restricted 
space of the variational parameters requires generically the negative semidefiniteness of another Hessian, the one for 
fsK(Q(^))\n=a- In special cases one can show that some eigenvalues of the Hessian of fsxiQ) correspond to the 
eigenvalues that of /sk(Q(A))|„ = o, such that non-negativity for the former ones implies non-positivity for the latter 
ones [|6l| , p6| p8| . Following the reasoning in Section 3.3 of Rcf. J84|] this can be intuitively understood in the way that 
the infinitczimal increment around an extremum of /s_r-(Q) is the sum of contributions negative in number for n < 1, 
responsible for the reversal of the type of extremum. For a more recent discussion of the problem of maximization in 
a descendant of the SK model see Ref. Ml . 



Aiming at an exact solution of the original minimization problem (2.7), one should choose a variational ansatz so 
that it includes the global solution. In principle, a parametrization should be adopted so that it gives a maximal 
fsK value over all possible parametrizations. Verification of the global nature of a maximum found within a given 
parametrization is a hard problem, physical intuition for the right parametrization and comparison with reliable 
simulation data, if such exist, may be of guidance. 



Considering that the replicated partiti on s um in (|2.6[) is symmetric under permutation of the replicas, a first guess 



is that also the minimizing Q-matrix in (2.7) - characterizing the state of the system at equilibrium - exhibits this 



symmetry. This leads us to the replica symmetric (RS) ansatz with a single variational parameter A = q = q a b G [— 1> 1] 



for all a ^ b, named Edwards- Anderson order parameter. The explicit evaluation of (2.9) with such an ansatz and 
clarification of the physical content of the resulting RS solution has been performed in Ref. |3(]|[l| . (For the sake of 
brevity we do not discuss the inclusion of external magnetic field and that of a nonzero average of the couplings Jy , 
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some main concepts can be presented without theim) The local stability conditions for the RS solution have been 
worked out by de Almeida and Thouless (AT) in [Q. It turns out that the AT stability condition is fulfilled only 
for temperatures beyond a criti cal ks T c = 1, below that the RS solution is AT-unstable, implying that the replica 
symmetry of the system in (2.6) must be spontaneously broken by the equilibrium state of the system. Intri guingly , 
this instability does not announce itself at any integer n, it only appears as n decreases from 1 towards [|96| , |95|,|l00[ | . 
Further evidences about the the fact that the RS solution is incorrect are the negative ground state entropy |60|] and 
magnetic susceptibility [101], and its predictions for the ground state energy and the probability density of the local 
magnetic field that contradict simulations, see |Iif| . 

In order to find a consistent description of t he S K model at low tempe ratures, several replica symmetry breaking 
(RSB) parametrization for the Q matrix in ( |2.7|) ha ve been proposed [102-107]. In what can be viewed as the 
generalization of Blandin's one-step RSB (1-RSB) [102], Parisi formulated on physical grounds a hierarchical structure 
for the Q matrix (see also Sect. III. 3 in ]T^]) a nd introduced the so far only RSB ansatz compatible with these 
conditions in an ingenious series of works [108 ]109 2C 23,110]. Depending on the number 2R + 1 of variational 
parameters A in this ansatz, one speaks of an i?-step, R = 0, 1, 2, . . . , RSB ansatz (i?-RSB), and of continuous RSB 
(CRSB) in the limit R — > oo. The RS ansatz corresponds to R = 0, and each higher step contains the previous 
ones as special cases. Later in the paper the explicit form of Parisi's ansatz for Q will be given and its consequences 
thoroughly discussed. Following Parisi's study, mostly focusing on the region near criticality, the deep spin glass phase 
was also extensively analyzed within the CRSB ansatz, see Refs. in |lJj8J]. We highlight among the non-perturbative 
approaches the work of Sommers and Dupond |p9f , where a variational free energy especially suited for numerical 
evaluation was constructed and used to resolve ground state properties. One of their successes was a theoretical 
prediction for the probability density of the local field, that favorably compared to the simulation of Palmer and Pond 
(see Fig. III. 6 of |Q). The generalization of the AT stability conditions for the case of an i?-RSB solution has been 
developed in a series of works by De Dominicis, Kondor, and Temesvari, initiated with Ref. ]9S[ ] and presented in the 
most general form in Ref. [111]. Due to the complicated form of these stability conditions, they could so far be verified 
for Parisi's solution only slightly below the AT instability. Yet it is widely believed that Parisi's solution captures the 
correct behavior of the SK model in the entire low temperature regime. 

The global stability of Parisi's RSB ansatz has not been verified by rigorous mathematics. It is physically supported 
in part by the suggestive picture of hierarchical organization of states in the glassy phase. Furthermore, it shows none 
of the aforesaid inconsistencies the RS solution was plagued by, and it compared satisfactorily with simulations. In 
fact, we do not know of any instance, where the replica method with Parisi's ansatz has been applied and at the same 
time well founded analytical or numerical approaches are available and would yield incompatible results. Neither is a 
case known to us which admits application of the replica method but cannot be handled in a self-consistent way by 
Parisi's ansatz with sufficiently many, possibly infinitely many, steps of RSB. 

As an alternative to the replica method, Thouless, Anderson, and Palmer [ 1 12 1 have established a modified form 
of the Bethe-Peierls method reproducing the RS results at high temperatures, while differing from bot h the RS and 
Parisi's solution in the AT- unstable region. This approach has been further developed by Sommers [113,114] in a 

,115 1 to be equivalent, in a certain limit, to a generalized version of the RSB ansatz 
105]. A second alternative method is t he dynamical approach of Sompolinsky and 

3, 116 1 . The latter may in turn be reproduced by 



way that was later realized J 106 



by Blandin and coworkers [102 

Zippelius [^2|3j], capturing Parisi's solution in the stati c ca se 
an iterative extension of the Blandin-Sommers scheme [ 106 1 , the first step towards the correct Parisi solution. A 
fur ther modified form of the Bethe-Peierls approach - the so called cavity approach - by Mezard, Parisi, and Virasoro 
jl4,117| contains the Thouless, Anderson, and Palmer equations as a special case but can also be extended to become 
equivalent to a Parisi-ansatz with an arbitrary number of RSB steps. Again, this is not a mathematically rigorous 
method but rather an ansatz in combination with an intuitive physical line of reasoning, verified by self-consistency 
in the end. While the physical picture is less elusive than that behind the formal n — > limit, the equivalent replica 
method in conjunction with Parisi's ansatz seems to be in a higher developed status as far as applicability for practical 
calculations is concerned. For instance, the self-consistency condition of the cavity approach, expected to be equivalent 
to the thermodynamic stability conditions of the replica method fl4|| , has so far been explicitly worked out only in 
the simplest case, corres pon ding to the AT stability condition for the RS state. Another formulation of the dynamics 
was given by Sommers |36J, who devised a path-integral approach specially suited for discrete variables like Ising 
spins. His results are in accordance with those of Sompolinsky and Zippelius, who used a continuous spin model 
that, in a singular limit, also covered the case of Ising spins. A recently suggested alternative method [118| studies 
the n-dependence of (Z n ), reiterates that different continuations to n — ► give the RS and RSB solutions, without 
the need of explicitly inserting Parisi's ansatz. However, the heuristics involved may cause that the exact solution is 
obtained only in special cases. 

There is a large family of spin glass models, consisting of various generalizations of the SK model, that have also 
been successfully treated by Parisi's ansatz, albeit mostly near criticality in a perturbative manner p4| . A prominent 
exception is Nieuwenhuizcn's multi-p-spin interaction model with continuous, spherical, spins p8[. The fixed p = 2 
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case is the long known spherical SK model, which can solved within RS flS4j , with multi-p-spin interactions, however, 
it can exhibit RSB. Remarkably, in CRSB phases the continuously increasing part of Parisi's order parameter function 
can be analytically calculated for any temperatures. Even with a fixed p > 2, one can als o have phases where the 
1-RSB solution is exact, a situation discussed for the neuron with Ising couplings in Section 



he 2 



The multi-p-spin 

model has also become a test bed for equilibrium thermodynamic calculations meant to capture asymptotic states of 
dynamics not maximizing the free energy |99| . 

It is well known that for a ferromagnet, the symmetry of the system as a whole, i.e., of the Hamiltonian, is 
spontaneously broken by the state of the system at thermal equilibrium, accompanied by a spontaneous breaking of 
ergodicity [ 1 1 9 1 . Such a state can be reached by the decreasing of the temperature, when the system undergoes a 
transition from a paramagnetic phase, exhibiting macroscopically spherical symmetry and ergodicity, to a ferromagnet, 
with only axial symmetry and restricted ergodicity. In the SK model described by the replica free energy ( [2.7] ). 
as temperature decreases, an analogous phase transition from a paramagnetic into a spin glass phase takes place 
pi, 120 121 57], with a concomitant spontaneous breaking of ergodicity and of RS. The transition ca n be mon itored 
by Parisi's variational parameters A at stationarity, thus playing the role of order parameters [20 110 122} |. The 
emerging intuitive picture of RSB is that of a very complicated, rugged, free energy landscape in some coarse grained 
state space, with a large number of local minima, many of them nearly degenerate, as well as a number of global 
minima, separated by free-energy barriers, whose height diverges in the thermodynamic limit. What in ordered 
systems thermal equilibrium state is, corresponds here to a global minimum, also termed as ergodic component, 
or pure thermodynamical state, or metastate. Within the Parisi solution pure states are organized according to a 
hierarchical, so-called ultrametric topology |30 28 123] ]. The ultrametric decomposition of the state space into pure 
states, from the practical viewpoint, helps in the calculation of non-self-averaging quantities |27],|2^], and is also a basic 
ingredient of the cavity approach in p~4 ,1X7 1 ■ However, so far i t wit hstood rigorous mathematical treatment, and as 
to real spin glasses, it is the subject of ongoing controversy [91 124 1. We would like to add here that, in the context 
of neural networks, examples are known [125-127| where there are multiple ground states, and they are grouped into 
disconnected regions, i. e., ergodicity is broken, while the replica method implies that RS is preserved. The afor esaid 
physical picture about RSB can be maintained by distinguishing between pure states and ergodic components [ 125 |, 
furthermore, it is unclear whether it is a spontaneous symmetry breaking that takes place in those networks. In the 
present manuscript we do not deal with such subtleties, and concentrate mainly on the replica method as a tool for 
calculation. 

The replica approach in conjunction with Parisi's ansatz provides so far the most complete description of the SK 
model in averaged thermal equilibrium. However, this scheme, as well as the equivalent cavity approach and the 
static limit of the path-integral formulation, involve certain procedures which, up to now, could not be put on a 
rigorous mathemati cal b asis. On the one hand, there exists a number of remarkable rigorous results concerning the 
SK model: in Ref. fl28}| it was shown that the quenched average A -1 (In Zj) qu approaches the so-called annealed 
average A -1 ln(Zj) qu in the thermodynamical limit (termed strong self-averaging property) above the AT-line and 
in the absence of an external magnetic field. The evaluation of A -1 ln(Zj) qu is straightforward and reproduces the 
RS solution. The basic reason behind these conclusions is the vanishing of the Edwards-Anderson order parameter 
so that the usual effective coupling of the replicas after averaging out the quenched disorder does not arise, i.e., 
(Z") QU — (Zj)™ u . Furthermore, some explicit bounds pertaining to the low temperature region have been obtained in 
[128 1 which imply |92| the existence of a phase transition at the same temperature as predicted by the AT-stability 
criterion. In |92|] it was shown by means of a rigorous version of the cavity procedure, called martingale method in the 
mathematic al ph ysics literature, that if the Edwards- Anderson order parameter is self-averaging then the RS solution 
is exact. In |l29|| it was rigorously verified that this order parameter is self-averaging and thus the RS solution is exact 
if the AT stability condition is fulfilled without and external magnetic field, and also under a slightly stronger than 
the AT condition in the presence of a field. In view of this theorem, it is suggestive that an AT-stable RS solution 
will provide the correct result also in other systems. It furthermore confirms Parisi's RSB ansatz to the extent that 
this ansatz reduces to the RS result if the AT con dition is satisfied. Finally, the previously discussed evidences as 
well as the rigorous mathematical proof from 128 that the RS solution is incorrect at low temperatures, it follows 
that the Edwards- Anderson order parameter is not self-averaging. Thi s feature is indeed reproduced by the Parisi 
solution. Another interesting rigorous result has been obtained in Refs. [13C,131] via the martingale method, namely 
that there exists a set of "order parameter functions" < x(q) < 1 such that the SK free energy can be expressed in 
terms of antiparabolic martingale equations, each of them involving one such function x(q) and being exactly of the 
same form as the non-linear partial differential equation in Parisi's CRSB scheme. The remaining non-trivial step in 
order to complete a rigorous derivation of Paris i's CRSB solution is to show that this set of functions is effectively 
equivalent to a single function x(q). Finally, in [ 132, 133 certain rather strong conditions are derived that should be 
satisfied by the order parameter of a class of spin glass models - including the SK but also short ranged models. 
These constraints are indeed fulfilled by Parisi's solution but still leave room for other possibilities. 

We remark that the replica method in combination with the Parisi ansatz is not restricted to the SK model and its 
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variants, this is also one of the main reasons why this paper was written. Nevertheless, most of the above rigorous 
results pertain to the SK model, only some of them have so far been generalized to the Little-Hopfield network, and 
none but the last one to even further systems. 



D. Little-Hopfield network 



One of the main breakthroughs of the statistical physical approach to othe r fie l ds w as achieved on the Little-Hopfield 
model by the replica calculation of Amit, Gutfrcund, and Sompolinsky [134-136|. They considered M randomly 



sampled patterns S^, /i = 1,2, ...,M, each of dimension N, where N is the number of participating neurons, for a 
fixed value of the so-called load parameter 



a = M/N 



(2.10) 



in the thermodynamical limit N — * oo. The starting point of the statistical mechanical treatment is a canonical 
Boltzmannian formulation of the problem. A microstate is a configuration of the ne uron states Si, i = 1, . . . , N and a 
pattern is considered as stored , if it is a stable fixed point attractor of the dynamics ( |2.2|). T he energy function for the 
random, sequential dynamics ( |2.2| ) is analogous to the Hamiltonian of the SK model in ( |2 . 3| ) [ p4[ . The main difference 
is in the exchange couplings, taken now as Jij — N^ 1 Y2u=i S^Sj, called Hebb rule, thus the patterns play the 
role of the quenched disorder. At positive temperatures the dynamics ( |2.2| ) the update rule for the selected neuron 
is non-deterministic, usually Glauber's prescription is applied, see, e. g., Ref. p4J . The original storage problem 
corresponds to the zero temperature limit. 

Within the RS ansatz Amit, Gutfreund, and Sompolinsky obtained as central result that the maximal number M c of 
patters which can be stored with an error of a few percent, scales as M c = a c N in the thermodynamical limit N — > oo 
with a critical capacity a c — 0.138. Criticality manifests itself by the drop of the overlap of a generic stationary state 
with the desired pattern from a value below, but close to, one to nearly zero. It has been immediately noticed [134] 
that the AT stability condition is violated at zero temperature for all a > 0, thus for exact results RSB is required, 
but already a quite small temperature restores the AT stabili ty an d thus the validity of the RS solution. 

Applying the 1-RSB ansatz, Crisanti, Amit, and Gutfreund [137| obtained a modified c ritic al capacity of a c ~ 0.144. 
The problem was reconsidered in the i?-RSB, R = 0, 1, 2, analysis of Steffan and Kiihn [138], who put forth a ground 
state capacity a c ~ 0.1382 based o n sev eral cross-checking of their computation. The authors raise the possibility 
that the Parisi- Toulouse hypothesis [139], implying that in a CRSB solution the magnetization in the SK model does 
not depend on the temperature, believed to be exact for vanishing magnetization, holds also in the Little-Hopfield 
model, at least as a good approximation. In that case, they conclude, the capacity is given by the intersection of the 
AT line and the RS phase boundary, that is, the capacity is essentially the one calculated form the RS sol ution . 

A CRSB calculation, an extension of Parisi's solution of the SK model within the formalism of Ref. [116], was 
performed by Tokita pi| . The sophisticated numerical method applied to evaluate the CRSB eq uations show ed an 
instability near a c ~ 0.155 ± 0.002, which he identified as the capacity. Numerical simulations [54 134 137] , 14Cf| gave 
estim ates mostly between the aforesaid finite i?-RSB and Tokita's CRSB results. However, a more recent simulation 
[141 1, including a finite-size scaling specially adapted for a discontinuous transition in the presence of quenched 
disorder, yielded a c — 0.141 ± 0.0015, in better agreement with the former result. Given the fact that the numerical 
evaluation of the CRSB state to the required precision is a much more form idable task than that of i?-RSB, R = 0,1,2, 
and that even 1-RSB computations were the subject of debate 137 13S |, the the question of theoretical prediction 
may still be considered as open. The main issue here is less the precise number, the interesting questions are rather 
the salient features of the phase diagram like reentrance, the validity of the Parisi- Toulouse hypothesis, or what kind 
of RSB describes the various phases [ 138,pT| . 



Tokita's fra mewo rk involving the freedom of a gauge function is closely related to the variational approach for 
the SK model [116,29|, inspired in turn by dynamical studies |^3| where the st atic gauge function is related to the 
time-dependent susceptibility. The variational framework we present in Section VII on a purely static ground, turns 
out to be very similar to those, albeit without our resorting to the gauge function. On the technical side, we are 
unaware of any non-perturbative CRSB analyses, that aims at the ground state or at least regions with frustration 
far from criticality, beyond those performed for the SK model and descendants, as well as the related Little-Hopfield 
model. Filling this hiatus was an important motivation for the present paper. 



The RS results of Amit, Gutfreund, and Sompolinsky have been re-derived in several different ways [ 142-146), 
based on certain assumptions which are possibly equivalent to that of RS. Alternative methods comparable to RSB, 
however, do not seem to be available yet. The authors of Ref. [144] speculate that their framework may admit such 
an extension, being based on Sommers' dynamical path-integral approach ]36| which successfully reproduced some 
RSB features in the SK model. 
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The following mathematically rigorous results for the Little-Ho pfiel d model are so far available. The self-averaging 
property of the free energy density has been proven in [ j93l]94f| . In [147] the RS sol ution is rigorously derived under the 
assumption that the Edwards- Anderson order parameter is self-averaging, and in |12£] the latter assumption is shown 
to hold under a condition similar to, but somewhat stronger than, the AT stab ility condition. Finally, a constraint 
similarly to ultrametricity on the order parameter has been derived in [132,133] which is indeed satisfied by the RS 
solution at high temperatures and Tokita's CRSB solution at low temperatures. 



E. Pattern storage by a single neuron 



As we have seen, the McCulloch-Pitts model neuron is the elementary building block of two prominent types of 
neural networks, the layered, feedforward, perceptron and the associative memory. Therefore the detailed exploration 
of such a single neuron is an indispensable pre-requisite for a satisfactory understanding of the collective behavior of 
networked units. 



1. Continuous synaptic coupling 
Firstly we describe the case of continuous synaptic couplings, i.e., arbitrary vectors J in (|2.l|). If their norm is fixed 



then the term spherical couplings is often used . Note that in E q. (2.1) the norm does not influence the output. An 



early remarkable results is due to Winder [148] and Cover 14E] regarding the maximal number M c of input-patters 
for which a single McCulloch-Pitts neuron can correctly reproduce the prescribed outputs according to ( |2.1[ ) . This is 
understood as a theoretical maximum, i.e., without reference to any specific training algorithm that may be necessary 
to find the right couplings. For randomly sampled patters S^, fj, = 1, 2, ...M their critical capacity a c = M c /N in the 
limit N — > oo approaches, with probability 1, the value a c = 2, a widely referenced result in artificial neural networks. 
An easy to follow account of Covers geometri cal proof, for arbitrary N, can be found in Sect 5.7 of and notable 



extensions have been worked out in [150-152 



A central notion for adaptive networks is the version space. This is the set of coupling vectors J compatible with 
the patterns, or, examples. Intuitively it is clear that the version space shrinks as the number of patterns increases, 
and beyond the capacity the version space is empty, at least with probability one in the thermodynamical limit. 

A breakthrough was achieved when the space of synaptic couplings of a single McCulloch-Pitts neuron was explored, 
following the proposition of Gardner [0], by Gardner and Derrida within both the microcanonical Q] and canonical 
]^| approaches. A main novelty of the concept was in reversing the traditional analogy between spin systems and 
neural networks. In the Little-Hopfield model the states of the neurons form the "spin space", and the synaptic 
couplings are the quenched parameters. The new proposition was to consider the couplings as configuration space for 
statistical mechanics, with constraints represented by randomly generated patterns to be stored, i. e., which should 
be reproduced by appropriate setting of the couplings, that is, to consider the version space. By the introduction 
of an appropriate cost, or, energy function in coupling space (further synonyms are Hamiltonian function, or, error 
measure), the stage was set for the statistical mechanical treatment. This does not restricts the study to the version 
space, but also allows for finite temperatures, so beyond capacity provides a framework to describe states with a given 
error, including the minimal positive error of the ground state. The common ingredient in both the Little-Hopfield 
and the Gardner-Derrida concepts is that patterns, i. e., examples, represent the quenched disorder, else they are 
quite different. For example, while the energy function of the Little-Hopfield network closely resembles that of the 
SK model, not much formal analogy exist between spin systems and synaptic coupling space. In what was a novel 
application of the replica method, within the RS ansatz, Gardner and Derrida reproduced, and generalized to biased 
pattern distributions, the Winder-Cover result. They calculated many a characteristics for the region below the critical 
capacity a Cl and also proved convergence of training algorithms. We note here that the traditional problem of error- 
free storage corresponds to the condition of zero energy in the ground state. If not all patterns can be accommodated 
by the couplings, that is, the neuron is beyond capacity, then, depending on the choice of the Hamiltonian, various 
positive ground state energies arise. 

The thermodynamical stability of the RS solution via the AT condition |9(| was formulated here by Gardner and 
Derrida ||] and revised later by Bouten |) 10 1. It turned out that the RS ansatz beyond the critical capacity a c = 2 



is unstable for the much studied energy function that measures the number of patterns that are not stored, i. e., of 
unstable patterns. This is sometimes called the Gardner-Derrida error measure and will be in our focus in the present 
paper. An improved 1-RSB ansatz by Majer, Engel, and Zippelius [Q and by Erichsen and Theumann ||, as well as 
the subsequent 2-RSB calculation by Whyte and Sherrington jll| , turned out to be still plagued by similar instability 
beyond capacity. The latter authors could prove that no finite i?-RSB ansatz in the ground state, beyond capacity, 
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may possibly be locally stable. In the present article we propose Parisi's CRSB ansatz as an appropriate description 
of a single neur on b eyond capacity, within the limits of an equilibrium, averaged statistical mechanical treatment. 

As shown in [153], the effect of frustration, manifesting itself in the spontaneous breaking of RS beyond capacity, 
brings along from the viewpoint of numerical simulations, a very hard, NP-complete problem That means 

that whatever algorithm is used to find an iV-dimensional vector of synaptic couplings J with the smallest possible 
number of misclassified examples, the time necessary for it is expected (a rigorous proof is not known) to increase 
faster than any power law with N. Simple algorithms that minimize the number of misclassifications locally, i.e., 
within a certain neighborhood of the initial choice for J, are due to Wendemuth 154 37]. While his result on the 
error measuring the number of unstable patterns significantly overestimated the error, as demonstrated in Ref. |l§| ] 
and cited in the present paper his algorithm may still yield acceptable approximations for global minimization as 
predicted by the CRSB theory. We refer also to Section VII. 3. in |Q for the analogous observations in the context of 
the SK model. Returning to generic NP-complete problems, by admitting some random element in the algorithm, the 
numerical effort can be reduced to some power of N , hence the name non-deterministic polynomial that NP stand s 
for. The price to be payed then is that the absolute mini mum will be found only wit h a certain probability [Il55| 
A most widely used such method is simulated annealing [159] and its descendants |l6(| . As pointed out in fT[ 




average time required for the numerical solution may undergo a dramatic change if certain parameters are varied, 
without changing its NP-completeness. Therefore, the so-called worst-case scenario, on which the classification as 
NP-complete is based, may in fact not capture very well the typical behavior, occurring with probability 1 as N — > oo, 
of such algorithms in specific applications. Conversely, a proof that a problem can be solved deterministically within 
polynomial times may still allow very long times for an algorithm to converge. Nevertheless, NP-completeness is 
generally considered as the signature of algorithmically hard tasks. 

It is natural to expect that some, possibly most, of the rigorous results and alternatives to the replica method for 
the SK and Little-Hopfield model can be carried over to the simple perceptron. However, so far available is only the 
cavity method in its simplest form, equivalent to a RS sol ution, together with a self-consistency condition equivalent 
to the AT stability condition of the RS solution [161-164]. 

Beyond the criti cal c apacity a c = 2 RS spontaneously breaks, entailing - like in the SK model - an ultrametric 
organization ]30| , |28 ,123| of the synaptic cou plin gs J that minimize, in the ground state, the number of incorrect input- 
output relations S^, fi= 1,2, M in (2T). Below a c , a complementary picture arises by introducing "cells" on 
the iV-dimensional sphere of synaptic couplings 



C 



{J | J 2 = N, sign(J • S") =a", y. = 1, 2, M} 



(2-11) 



labeled by the 2 M possible output sequences cr = {cr^}. The idea to study the simple perceptron in t erms of these 
cells C CT is to some exten t alr eady contained in Cover's geometrical derivation of the storage capacity [ 149 1 and has 
been em ployed again in [165|. An appropriate quantitative framework has been elaborated by Monasson and co- 
work ers |125 .166-168| in the context of multi- layer networks and has later been adapted to the simple perceptron in 
[169 17l| . Based on a replica calculation, this method enables one to characterize the distribution of cell-sizes \Ca-\ to 
exponentially leading o rder in N in terms of a so-called multifractal spectrum, similarly as in the thermodynamical 
formalism for fractals [ 172, 173 1. This multifractal analysis opens an interesting view on the storage as well as the 
generalization properties of the simple perceptron. 



2. Ising couplings 



Storage properties change considerably, if one restricts the analysis to so-called Ising couplings, where each com- 
ponent of J can take only the two possible values ±1. This extra constraint is partly motivated by the fact that in 
a digital computer the Jj-s have a discrete representation. It has been observed already by Gardner and Derrida || 
that a self-contained treatment by an RS ansatz of the critical storage capacity with Ising couplings is not possible 
within a canonical statistical mechanical approach. 

Krauth and Mezard performed a 1 -RSB analysis with the prominent result a c ~ 0.833 for the critical storage 
capacity of the Ising perceptron [174|. Their 2-RSB explorations furthermore indicate that no new solution arises 
w. r. t. 1-RSB. The RS state turns out to be globally stable up to the capacity limit, the latter being signaled by 
a vanishing of the entropy. This is an intriguing coincidence that could not have been foreseen by the RS analysis, 
because therein the point whence the entropy becomes negative is obviously only an upper limit for the capacity. 



The need for RSB to calculate the capacity should be contrasted with the spherical case, see Section [IE 1, where 
the capacity could be determined within the RS solution. The reason for the difficulty here is in that the transition 
form perfect to imperfect storage is discontinuous for Ising couplings. Here the order parameter exhibits a jump in 
the sense that one of the overlaps in 1-RSB is not the continuation of the RS value, when a passes a c . From the 
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viewpoint of the order parameter such a transition can be termed first order. On the other hand, since the probability 
weight of the discontinuously appearing order parameter value vanishes at a c , the first derivative of the free energy 
remains continuous and only the second one jumps. The Ising neuron also demonstrates the importance of global 
stability of a state. The RS solution formally exists beyond the transition and stays locally, i. e., AT-stable up to 
a — 4/71". However, its free energy is smaller than that from RS, so global stability appears to be taken over by the 
1-RSB solution, like in first order transitions. It should be added that here the locally stable but globally unstable RS 
solution should be ruled out as a metastable state in the traditional sense because of its negative entropy. Furthermore, 
the 1-RSB solution is not a locally stable equilibrium state before the transition, so two spinodal points collapse onto 
the transition point. 

While a major part of the existing statistical mechanical investigations - including the SK model in (2A) and 
our present study of the simple perceptron - are based on a canonical Boltzmannian formulation of the problem, 
Gardner's seminal calculations in Ref s. uses microcanonical ensemble. For the Ising perceptron, this approach 
was adopted by Fontanari and Meir |175| ], reproducing Krauth and Mezards results without going beyond RS and 
verifying in particular the AT stability condition |9(| as well as the physical requirement of a non-negative ent ropy. 

Computing the optimal vector J of synaptic couplings for t he Ising perceptron is an NP-complete problem [|13|jl6| 
for any positive load parameter a, as demonstrated in Refs. [176,177]. The challenge of numerically estimating the 
critic al capacity a c has been attacke d by several groups, most of them verifying a c ~ 0.83 3, with th e exception of Ref. 
[178|, criticized by the comment in [179]. Subsequent, more extensive computations in 

sin 

appear to confirm 

the original critical value. 

Below critic al ca pacity, a multifractal analysis of the space of Ising couplings J, inspir ed by the work on the 
spherical case [125] as discussed in the previous section, has been worked out in 1 169 182]. Beyond criticality, a 
thcrmodynamical stability analysis [183] suggests that 1-RSB i s loc ally stable at and beyond a c . On the other hand, 



also the microcanonical RS approach of Fontanari and Meir [175] continues to coincide with Krauth and Mezards 
results and satisfies the local thermal stability criterion of de Almeida and Thouless |9(| . 

The above numerical and analytical findings have given rise to the conjecture that the Ising perceptron beyond 
capacity behaves quite similarly to Derrida's random energy model fl62f . This sy_stem is the p — > 00 limit of the p-spin 
interaction version of the SK model. In particular, the 1-RSB ansatz yields p3| indeed the what is accepted as the 
exact solution of the problem within the canonical Boltzmannian approach and the zero entropy condition marks the 
transition from RS to 1-RSB. Interestingly, as it has been done originally by Derrida, even the spin glass phase of the 
random energy model can be described by the replica method, but without the need to introduce the 1-RSB ansatz. 
There by a direct calculation the mean free energy could be maximized, without dealing with spin overlaps, so this 
can be considered as an independent confirmation of RSB as applied later by [Q . 

In the case of the neuron with Ising couplings, like in the random energy model, an overlap q\ — 1 arises, with 
probability exactly 1 — x = 1 — T/T c . The fact that the microcanonical formulation within RS gave as minimal error 
the ground state error beyond capacity |175| as the canonical 1-RSB result [174], is a further peculiarity of Ising 
synapses. There is no technical contradiction, however, because if (ft = 1 is set then the 1-RSB free energy becomes 
equivalent to the RS microcanonical entropy. This can be understood, if one realizes that in the latter the temperature 
is essentially an extra variational parameter, taking the role of 1/x, related to the aforesaid probability in 1-RSB. The 
special nature of the microcanonical ap proa ch was interpreted, and exploited for calculating the storage capacity of 
certain multi-layer perceptrons, in Ref. [ 184 1 . 

Further systems where stable 1-RSB phases arise, albeit generally without the zero entropy condition, are the p-spin 
interaction SK model (24|] , its spherical variant (64| , the spherical, multi-p-spin interaction model |}8| , the Potts glass 
@|7||25), and protein folding models (HH- 

The general framework in the present paper includes both continuous and Ising synaptic couplings J . Since the 
case of principal interest here is Parisi's CRSB ansatz, in the quantitative numerical evaluation beyond capacity we 
will focus on the example of the continuous, spherical, couplings. Whether or not a continuous RSB ansatz will be 
necessary for more general Isin g networks than the McCulloch-Pitts model, e.g., in multi-layer Ising perceptrons, 



remains to be seen 16? 



185 



186 



F. Training, error measures, and retrieval 



W e recall that the patterns to be stored are prescribed as pairs S M , £ M , fi = 1, M and the McCulloch-Pitts neuron 
(|2.l|) is required to reproduce ^ in response to S M . Next we define the so-called local stability parameters 



N 



a» = evr 1/2 J2 J K s . 



(2.12) 



k=l 
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where the normalization factor |«7| -1 / 2 guarantees a sensible behavior in the thermodynamical limit N — > oo if the 
patters S M are normalized to a length of the order TV 1 / 2 . Introducing an error measure on a pattern as V(A M ), the 
V(y) called a "potential", one is lead to the Hamiltonian 



M 



W = £V(A"). 



(2.13) 



Minimizing this Hamiltonian in the space of couplings J is the task of tr aining. In particular, maximizing the number 
of correctly stored patterns in (2.1) is equivalent to minimizing ( 2.13| if one chooses the potential V(y) — 0(— y), 
where 6(y) denotes the Heaviside step-function. Training in J space contrasts with the neuron (spin) dynamics of 
the Little-Hopfield model, aimed at retrieving stored patterns. 

If more than plain memorization of the classifications £ M of the training examples is required, then other choices 
of V{y) may be advantageous. For instance, 



V(y) = e(K-y) {n-y) b 



(2.14) 



with positive re and b tries to impose, upon minimization in ( 2.1 3| ) , the conditions A p > re on all the local stabilities 
(2.12). For the step function potential, b = 0, the number of violations of A M > re is minimized, but those A M which 
violate the condition may take values arbitrarily far below re. For a softer potential with b > 0, a compromise must be 
made between minimizing the number of violations and of the "cost" (re — A' 1 ) 6 of the committed error. In any case, 
the qualitative effect of positive re after minimization in ( |2.13|) is that inputs S in ( 2A ) close but not iden tical to one 
of the stored patterns can still be associated with the correct output For load parameters (2.10) belo w the 
critical capacity, a < a c , one will typically choose the largest possible re- value admitting a zero training error in ( 2.13| ) 
and thus A^ > re for all patterns. This maximal K m ax{oi) as a function of the load parameter a has been calculated 
by Gardner and Derrida in ||. Note that K max (a) is the same for any b and that K max (a c ) — 0. Beyond the critical 
capacity, a > a c , not all the training patterns can be stored anyway, thus sacrificing some additional ones by choosing 
re > may still be desirable to create a finite basin of attraction in the retrieval dynamics for patterns for which 
A M > re can be achieved S. Attractors of the dynamics (2.2) not corresponding to one of the stored patterns, spurious 



states, represent failure of memorization. We also mention that training the J couplings for each neuron separately 
leads to lifting the symmetry of Jij-s in the original Little-Hopfield model. That leads to the loss of the equilibrium 
being described by a Hamiltonian, and to a dynamics exhibiting more complex time series than convergence to a fixed 
point §§. 

The above concept of training corresponds to T — dynamics in J space, and can be complemented by a stochastic 
element to represent positive temperatures. The main focus of the present paper is describing the final equilibrium 
states of such dynamics. 



A further motivation for studying potentials V(y) even more general than in (2.14) is the fact that a discrete 
time version of the gradient descent dynamics of J in the corre spond ing energy landscape (2.13) reproduces several 
well-known learning algorithms [^). For instance, the potential (|2.14 ) with b = 1 induces a dynamics very similar to 
the perceptron algorithm of Rosenblatt |^| and later Gardner [J Beyond capacity, when the convergence of such 
algorithms to a state with minimal positive error is not proven, there is only an intuitive ground for using such 
algorithms, and obviously modifications arc necessary [154,37j. 

Next we turn to the retrieval behavior of the Little-Hopfield associative memory network dynamics ( |2.2| ), charac- 
terized by the time dependent overlaps 



1 N 



(2.15) 



fc=i 



of the processed pattern S(t) with the stored patterns (fixed point attractors) S M . An input pattern S = S(0) is 
associated under the dynamics ( p.2[ ) with the stored pattern if m^i) evolves towards 1 in the course of time, while 
m 1/ (f) — » for all other patterns v ^ /.i. The smallest value of to m (0) which still leads to a successful retrieval, i.e., 



m v {t) 



is a measure for the basin of attraction of the stored pattern 



In the t hermodyn amical limit N 
derived in Il87|jl8§|| 



oo the following result for the first time step of the evolution in (2.2) has been 



m"(l) = / p(A)erf 



1^(0) A 



y/2[l -mA-(O) 5 



dA 



(2.16) 
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where erf(a;) = 2tt 1 / 2 JJf e y2 (i?/ and /o(A) is the distribution of the local stabilities from (2.12), defined as 



M 

p(A) = ^$>(A-A^) 



(2.17) 



In general, p(A) depends on the algorithm by which the vect or of synaptic couplings in (2.12) has been computed. It 
has been assumed that all the McCulloch-Pitts units in (2.2) have been independently trained according to the same 
algorithm, thus in the thermodynamical limit p(A) will be the sam e func tion for all of them. We mention that with 
the Hebb rule this condition of independence does not hold, thus ( 2.16 ) is not valid for the original version of the 
Littlc- Hopfi cld model. In the case, when J has been obtained by minimizing a Hamiltonian function of the general 
form ( [2.13 ), the resulting distribution of overlaps p(A) will be one of the most important quantities of our present 
work. 



When (2.13) is minimized with the maximal K-value in (2.14) admitting an error-free storage of all training patterns, 
i.e., k = n max (a), Kepler and Abbott 1 187 ] have observed numerically that retrieval is successful if and only if 



m"(l) > [1 + m"(0)]/2 



(2.18) 



In the thermodynamical limit this seem s to be exact, or a very good approximation, at least for sufficiently small load 
parameters a such that K max (a) > 0.6 [ 187 1 . 

In general, the further time evolution of m^ft) becomes increasingly more complicated than the first time step 
Analytical approximations as well as numerical studi es for various spe cific learning rules for the synaptic 
ings J (including the Hebb rule) have been elaborated in [140, 145 ,189 -192]. For randomly dilute networks such 



(2.16) 



coup 



that the fraction of n on-ze ro synaptic couplings Jik in (2.2) tends t o zero like N 1 IniV in the thermodynamical limit 



it has been shown in |188| that the same dynamics for ?7J M (t) as in (2.16) remains valid for arbitrary times t, provided 
the initial condition S(0) has an appreciable overla p with only one of the stored patterns S M . Further interesting 
explorations along these lines can be found in Refs. |144 .193,194| 

A question of particular interest for our present study has been addressed by Griniasty and Gutfreund Q, namely 
whether it may be an advantage with respect to the retrie val properties to increase K in (2.14) beyond the threshold 
Kmax{ot) of error-free storage in the minimization of (2.13). For randomly dilute networks they demonstrated analyt- 
ically that this is indeed the case provided a < a c (n = 0) = 2, but that the critical storage capacity a c = 2 itse lf 
cannot be surpassed by this trick. For a < a c (n — 0), the effect of choosing k > K max (a), with b = 1 in ( 2.14 ), 
i s tw ofold. On the one hand, the patterns themselves are no longer attractors but converge under the dynamics 
( |2.2| ) towards nearby fixed points. On the other hand, the basins of attraction of these fixed points steadily grow as 
k exceeds n max (a) and rather soon reach the "full basin scenario", i.e. every input pattern S = S(0) with a finite 
initial overlap m M (0) > will converge towards the same attractor as S M does. We r emar k that these conclusions 
in H are based on a RS ansatz which is not rigorously valid P,p^| for the potential (2.14) with k > K max {oi) and 
< b < 1. The CRSB scheme of our present work may be needed for an exact treatment, though the quantitative 
corrections are not expected to be large. 



G. Multi-layer perceptrons 



As far as practical applications to real problems are concerned, multi-layer perceptrons are the most important 
networks tractable within a statistical mechanical approach. They have great computational abilities and at the same 
time are not prohibitively complicated due to the absence of feedback effects. Still, the very property that these 
architectures are able to implement nontrivial tasks of practical interest makes their theoretical analysis difficult. 
Qualitatively, the flexibility of multi-layer perceptrons is due to the fact that the individual McCulloch-Pitts units 
within each layer can share the effort to produce the corr ect output. O n the one hand, this "division of labor" gives 
rise to intricate anti-correlations between their activities [186, 195- 198| . On the other hand, for not too small set of 



training examples, it brings along a spontaneous breakin g of their permutation symmetry, possibly superimposed in 
addition by a spontaneous breaking of replica symmetry |184[. Note that permutation symmetry of units in a layer is 
understood in the average sense, for a given training set of patterns there is generically no such symmetry. We will 
not present here a systematic discussion of the ongoing research on these topics but rather highlight two particular 
aspects of specific interest from the viewpoint of the simple perceptron analysis in our present paper. One being 
the capacity of multi-layer networks, and the other one the possibility to mimic multi-layer structures with a single 
unit with a non-monotonic transfer function. For a more detailed overview, especially regarding learning algorithms 
and generalization properties, we refer to [19 199 1 and for the present state-of-the-art to [197 200-203] and further 
references therein. 
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The storag e capacity of multi-layer perceptro ns has b een analyzed within a statistical mechanical approach for the 
first time in 186 19^,204] for spherical and in 1 185, 1861 for Ising perceptrons, addressing the simplest case with one 
adaptive input layer (first layer) and one hidden layer (second layer). The latter is governed by a pre- wired Boolean 
function, mostly either a so-called committee or a parity machine. For an Ising parity machine with non-overlapping 
receptive fields, i. e., tree architec ture, 1-RSB seems to be exact [185|. For fully connected machines with spherical 
synaptic weights |l86|jl9£|j20#05|| , the assum ption of RS can not be u pheld since it yields results incompatible with 



HHH l), based on a generalization of Cover's line of 



the rigoro us bo und of Mitchison and Durbin 152 (see also 

reasoning [ 149 1 . While an improved 1-RSB ansatz respects thes e limits, the necessity of additional steps of RSB in 
order to draw reliable quantit ative conclusions remains unclear [186, 205 1. 

A first alternative approach [184] suggests to break the permutation symmetry of the hidden units explicitly prior to 
the actual replica calculations, but the resulting equations are approximations and difficult to solve for lar ge netwo rks. 
A second alternative method is the cavity approach, elaborated on a level equivalent to an RS ansatz in [162,164]. A 



most promising new roadway seems to be the multifractal analysis of the space of synaptic coupli ngs by M onasson 



and co-workers [125 , 166 - 168j ] . One of the most remarkable findings of these and subsequent wor ks p6H20Sl] is that 



an RS ansatz in this approach yields results very close but not identica l to those of a 1-RSB ansatz in the standard 
treatment along the lines of Gardner and Derrida [jl85| , 186| , 195| ,204,205:. 

For our present study the salient observation is |197| that the increased power of multi-layer networks in comparison 
with the simple perceptron stems from the possibility that the single McCulloch-Pitts units may all operate in the 
region beyond their individual storage capacity, while the network as a whole is still below its maximal storage 
capacity, the reason being that via the division of labor, the errors of one unit may be rectified by another one, or 
made up for collectively. Specifically, results for a simple perceptron beyond its storage capacity have been utilized 



for the exploration of multi-layer networks in [201, 209 1 



As a generalization of the simple perceptron with input-output-relation (2.1), the following setup was introduced 
in |210| 



N 



fc=l 



V~i(y) = sign([y-j]y[y + j}) 



(2.19) 
(2.20) 



oo and, unless indicated 
1 if y > 7 



Like in ( 2.12|) , the scaling by |J| -1 / 2 guarantees a sensible thermodynamical limit N 
otherwise, we will focus on the case of a spherical constraint J 2 = N. The potential (2.2C) outputs 
or 7 < y < a nd — 1 otherwise y) — ~Vy(y)), hence the name "reversed- wedge perceptron" for the input- 

output relation (2.19) was coined i n pllfl . Without loss of generality, one can focus on non-negative p arameters 7, 
reproducing the simple percept ron (2.11 ) for 7 = 0, and its equivalent reversed counterpart for 7 — ► 00. In [ 21C . 212, 213] 



the reversed-wedge perceptron (2.19) was stud ied as a generalization of the simple perceptron with an increased storage 
capacity as main result. As revealed in [214], the assumption of RS, on which those first works are based, ceases to 
fulfill the AT stability condition before the limit of capacity is reached, and an improved 1-RSB calculatio n mo difies 
the storage capacity by more than a factor of 2 for 7-values of the order of one. It has been conjectured [ 125 1 , that 
a consistent treat ment of the problem is only possible by means of the gene ral Parisi RSB framework. The storage 
problem in ( 2.19| ) is equivalent to a minimization of the cost function (2.13) with the potential from ( 2.20| ), so the 
problem becomes a special case of the theory presented later in this paper. 
In [214] it was observed that by rewriting (2.19, 2.20) as 



(2.21) 



with 9j = (j — 2)7, the reversed-wedge perceptron may also be looke d upo n as a toy model of a multi-layer perceptron. 
To s ee this, we first note that each factor of the form sign(y — 0) in ( [2.2l| ) is a generalization of the simple perceptron 
(2.1) with a "firing threshold" 6 as new feature. Such a threshold has a well founded biological basis but has been 
omitted from many a theoretical study |19| . A systematic exploration of perceptrons with a threshold by way of a 
replica analysis has been undertaken in ]12| . Returning to ( 2.21| ) we see that this input-output relation represents 
a special kind of a two-layer perceptron with three McCulloch-Pitts units, endowed with different thresholds but 
identical synaptic weights J in the first layer, and a non-adaptive second layer, pre-wired according to a so-called 
parity machine. Besides the two-layer architecture the toy model suggests, and the occurre nce o f RSB before the 
maximal storage capacity is reached, several further features of the reversed-wedge perceptron (2.21) have been found 
to qualitatively agree with characteristic properties of real multi-layer networks 126 214| , 215 ] . 
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As shown in [125,214], the version space is partitioned into an exponentially large number of disconnected com- 
ponents for any positive load parameter a. Nevertheless, up to a certain finite a-value, the RS solution appears to 
be correct [ 125 1 . This observation invalidated the hitherto widespread belief that unbroken RS signals a connected 



(possibly convex) version space and that RSB is tantamount to the breaking of ergodicity. The quite subtle point is 
that, in the thermodynamical limit N — > oo, each of the exponentially many disconnected components of the version 
space, has an infinitczimal contribution to full volume thus validating the RS result, while beyond a certain a they 
become small in number, but each having a significant relative contribution to the volume of version space, thus 
causing RSB. 

In all previously mentioned cases, RSB was intimately connected with frustration and a rugged energy landscape 
with nearly degenerate minima. In the case of the reversed-wedge perceptron, below its maximal storage capacity, 
there is no frustration. Then all constraints in the form of input-output relations ( 2.19Q are satisfied for vectors J 
belonging to the version space. The local minima may now be identified with the exponentially many disconnected 
domains of the version space, each having exactly the same energy and being completely flat with bottom level at zero 
energy (muffin-tin shape) . In place of a spontaneous one may now rather speak of an induced breaking of ergodicity, 
that can be attributed to the non-monotonicity of the potential, and may be the reason why RS remains applicable 
for smaller as. Due to the absence of frustration it may come as a surprise that, for sufficiently large but below 
capacity a values, Parisi's scheme, including the ultrametric org aniza tion of the ergodic compo nent s, is apparently 
still applicable. A very similar s ituation arises for potentials in (2.14) with negative K-values |5,199| and for certain 
unsupervised learning scenarios [216,217], involving potentials of the form 



V(y) = e(K-\y\) , 



(2.22) 



in the regime below the respective critical capacity value of the load parameter a. Beyond the critical a-value, in all 
cases frustration sets in, where it is natural to expect Parisi's RSB scenario. 

Various gener aliza tions of the reversed-wedge perceptron have been exp lored, two of which we find particularly 
interesting. In [218] the case of more than three discontinuities in ( j2.2C| ) has been considered. As the number 
of discontinuities increases, the maximal storage capacity is found to increase and also the consideration of RSB 
effects becomes more a nd m ore important for qua ntita tively reliable results. The reversed wedge Ising perceptron 
with 7 = (21n2)V2 in (5^3) was demonstrated in |127| to saturate the information theoretical upper bound for the 
maximal storage capacity, a fact which has found its natural physical explanation by means of a multifractal analysis 
of the version space in [219]. The concom itant vanishing of the Edwards- Anderson order parameter has, like in the 
high temperature regime of the SK model [ 12S ] , the consequence that the annealed approximation coincides with the 
RS solution p7|. 
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III. STATISTICAL MECHANICS OF PATTERN STORAGE 



We now set the stage for the detailed study of the single neur on b y intr oducing the model and reviewing basic 
statistical mechanical notions. With the exception of Sections V C and VI C , the style of presentation is meant to be 
self-contained henceforth. Some overlaps with Section O are the consequences. 



A. The model 

We consider the McCulloch-Pitts model neuron |l9f| , 

£ = sign(», (3.1a) 

h = N~ 1 / 2 Y^ =1 JkSk, (3.1b) 

where J is the vector of synaptic couplings, S the input and £ the response. The normalization was chosen so that h 
is typically of O(l) when N — ► oo. Patterns to be stored are prescribed as pairs 

{S^n^i (3-2) 

such that the neuron is required to generate £^ in response to S^. Given the ensemble of patterns, the local stability 
parameter 

A M = h^f (3.3) 
obeys some distribution p(A) flq]. The /i-th pattern is stored by the neuron, if the actual response signal from Eq. 



(3.1) equals the desired output i. e., A M > 0. The number of patterns M is generically of order N, so 

a = M/N (3.4) 

is an intensive parameter. For the sake of simplicity, we generate the Su-s independently from a normal distribution, 
and take £ M = ±1 equall y lik ely. The corresponding probability density will be denoted by P ({S M , 



Since the output £ in (3.1) is invariant, if J is multiplied by a factor, it is useful to eliminate this degree of freedom 
by the spherical constraint \ J\ = y/~N. In general, a prior distribution w(J), not necessarily normalized, expresses our 
initial knowledge about the synapses. The spherical constraint, which we choose to normalize, corresponds to 

w{J) =C N s(N- |J| 2 ) (3.5a) 

C N = NT^j\ (Ntt)-% (3.5b) 

Another generic type of prior distribution is when it prescribes independent, identical constrains for the synapses as 

N 



™(J) = YlM-Jk), (3.6) 



k=l 

e. g., binary, or Ising, synapses have 

w (J)=S(J-l)+6{J + l). (3.7) 

This prior distribution is not normalized. Its scale is conveniently set by requiring that J| averages to unity, that 
is J wq{J)J 2 dJ = f wq(J) dJ, whence TV -1 2fcLi § oes to 1 for large N. "Soft spins" are generated by smooth, 
multiple-peaked wq(J)-s. 

Our main goal is to find those J-s that store the prescribed patterns, i. e., are compatible with the patterns. The 
problem can be reformulated as an optimization task with a suitable cost function, i. e., Hamiltonian, to be minimized. 
A convenient choice here is the sum of errors committed on the patterns 

M 

H = J2 v (&n, (3-8) 
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where the potential V(A^) measures the error on the fi-th pattern S^,^. A natural error measure V{y) is zero 
for arguments larger than a given threshold k and monotonically decreases elsewhere [0. Storage as defined above 
corresponds to k = 0, while a k > means a stricter requirement on the lo cal s tability A and ensures a finite basin 



of attraction for a memorized pattern during retrieval. The Hamiltonian (3.8) defines through gradient descent a 
dynamics in the space of couplings J. Specifically, 

V(y) = (k - y) b 0( K - y) (3.9) 

corresponds to the perceptron and adatron rules for 6=1,2, respectively, where 9(y) is the Heaviside function, see j^] 
and references therein. There is no such dynamics in the case 6 = 0, but because of its prominent static meaning - the 
Hamiltonian counts the incorrectly stored patterns - we will consider that in concrete calculations. Furthermore, the 
9{y) function can be approximated by a smooth one th at d oes have an associated dynamics. Thus our present study 



of thermal equilibrium with the Hamiltonian involving (3.9) with 6 = can be thought of as the average asymptotics 



of such a dynamics. A non-gradient-descent algorithm, designed to minimize the Hamiltonian with 6 = 0, will be 



discussed in Section VII B 5 



B. Thermodynamics 

The Hamiltonian (|3.8|) gives rise to a statistical mechanical system fl |l88|] resembling models of spin glasses with 
infinite-range interactions |L4]]84|]. A microstate is a specific setting of the synaptic weight vector J, quenched disorder 
is due to the randomly generated patterns, and a positive temperature T — /3 _1 has the effect of introducing tolerance 
to error of s tora ge. (We use the convention of setting Boltzmann's constant to unity.) The partition function assumes 
the form p Ui&j 



J d N J w( J) exp V(A")^ . (3.10) 



Integration is over the entire real axis if not denoted otherwise. For large N we expect self-averaging [^4|8J], that is, 
for a given instance of the quenched disorder the extensive thermodynamical quantities are assumed to approach their 
quenched average. This leads us to the thermal statics of the system, where the question of breaking of ergodicity 
on some time scales is not dealt with. The replica method Jl4| , p4[ starts with our writing the mean free energy per 
coupling as 

/ = _ l im {lnZ y = Urn Um l _S^%L , (3.11) 

tv^oo N/3 JV^oon^o nN/3 y ' 

where 

. M 

(• • • >„» = / p as^e}) • • • n d ^ dNs " ( 3 - 12 ) 

stands for the quenched average over patterns. In order to carry on with calculations, it is common practice ]l9| , |i~i| , [S4[ 
to interchange the limits n — > and N — > oo. In what follows we accept the reversal of limits based on numerous 
examples wherein the consequent results were verified by other analytic methods or numerical simulations, see for 
example Refs. [|l9 '. 
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Introducing the thermal average as 



Z- 1 J exp ^-/3 V( A ^ ■ ■ ■ w(J) d N J, (3.13) 



one naturally obtains the mean error per pattern as 

e=((V(A)) th ) qu . C-i.14, 



From the free energy this derives as 



1 9/3/ 
a dp ' 



(3.15) 
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it is thus the analog of the thermodynamical energy. The mean entropy per synapse, or, the specific entropy, 

s = 0(ae-f) (3.16) 

is a measure of the volume in coupling space associated with a given mean error. 

A case of special significance is when at T — the mean error is zero, i. e., storage is perfect. Then the partition 
function becomes the integral f2 of the prior distribution over the version space, f2 = Z\ T=0 . If the prior distribution 
is uniform over the version space, as it is in the case of spherical normalization, then O is proportional to its volume. 
The zero temperature entropy measures this volume as s| T=0 = iV _1 InQ. 



C. Spherical and independently distributed synapses 

Before proceeding, we warn the reader that quantities of different types - variables, functions, functionals - may 
be denoted by the same symbol, the difference possibly shown by the type of the argument. An example is the free 
energy and the replica free energy, as shown below. Such practice will be limited to cases when there is little chance 
for confusion. 



In the case of the spherical constraint (3.5), the free energy per synapse was first proposed in [^[ jl8q| for the special 
error measure V(y) — 0(n — y). The replica symmetric free energy was given by || for a general error measure V(y). 
For a general V{y) without the assumption of replica symmetry the free energy reads as 

/ = lim - min /(Q) (3.17a) 

n—tO n Q 

/(Q) = / B (Q)+a/.(Q), (3.17b) 
/ S (Q) = ~(2/3)- 1 lndetQ, (3.17c) 

/e(Q) = ~ln / ^^exp^-pY^^Viv^ + ixy-^xQx], (3.17d) 

see Appendix for derivation. Note that the minimum condition should be understood as the extremum in Q and 
non-negativity of the eigenvalues of the Hessian in terms of the matrix elements of Q. Marginal linear stability is 
allowed, and as we shall see later, will occur in some phases. Once the Q matrix is appropriately parametrized then in 
terms of those parameters the extremum becomes a maximum for n < 1. So the minimum condition above is meant 
before such a parametrization is applied. These considerations deal with the consequences of our having interchanged 
the limits N — ► oo and n — ► . 



The entropic term / s , for which we used the concise form from [ 220 1 , is specific to the spherical model. On the 
other hand, the energy-term f c , first displayed in [Q, is independent of the prior constraint for the synapses. The 



n x n matrix Q has been introduced through the constraint H,188j 

1 N 

[Q] Q 6 = lab = J akJbk, (3.18) 

k=l 

i. e., Q is the matrix of the overlaps of the synaptic couplings, is symmetric and positive semidefinite, with uniform 
diagonal elements q aa = q^ = 1 and — 1 < q a b < 1- Here the indices a, b = l,...n are so called replica indices; a 
quantity with label a belongs to the a-th factor in the power Z n of the partition function Z . Any quantity carrying 
a replica index is intimately related to the replica method, and its observability needs to be clarified extra. Only the 
off diagonals, q a b with o^i, entail minimum conditions. Let us introduce the mean of some function A(x, y) as 



-^.A(x,y)expi [ -l3j2 a=1 V(ya)+ixy--xQxj , (8.L!)) 



where the prefactor ensures ((l)) e = 1, and the subscript refers to the fact that the expectation value is associated 
with the energy term. Then, using 

we obtain 

<lab = a (( x aX b )) c , a^b (3.21) 
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as the extremum condition in ter ms o f q a t. 
If the prior distribution is like (3J3) then 



/ = lim — min extr /(Q, Q), 
n Q Q 

/(Q,Q) = / i (Q,Q) + / s (Q)+a/ e (Q) ) 
/i(Q,Q) = ^TrQQ, 



/ 8 (Q) = -~ln 



\0 2 JQJ 



\ [ W (J a )dJ a 



(3.22a) 

(3.22b) 
(3.22c) 

(3.22d) 



and / e (Q) is given by ( 3.17d ). The special case of Ising synapses (3/7) gives the free energy in |5,174]. Besides Q there 
is now another symmetric auxiliary matrix, Q, whose diagonals are q aa = <?d = 0. The derivation of the above free 
energy is given in Ap pen dix |^. We emphasize that the type of extremum in Q is not restricted to minimum, see the 
argument below Eq. (B4). 

The interaction term /; together with the entropic term, here / s , at extremum corresponds to the entropic term 
(3.17c) in the spherical case. If we introduce the mean associated with the entropic term here as 



((A(J))) S = e"^W J A(J)e^ 2j(iJ 

then the stationarity condition in terms of a q a b reads as 

Qab = ((J a Jb)) s , 

and that by q a b gives 

P 2 q a b = ~a ((x a x b }) e ■ 



J\ w Q (J a )dJ a 



(3.23) 



(3.24) 



(3.25) 



For the diagonals are not var ied, the a bove equations should hold only for a ^ b. Note that in the limit n — > the 
normalization coefficients in ( 3.19| ,3.23) each become 1, so for most purposes those formulae can be understood as if 
the coefficients were absent. 



D. Neural stabilities, errors, and overlaps 



The probability distribution of the neural stability parameter A associated with stored patterns is given as |],[7| 

P (A) = ((S(A-h^)) th ) qu , (3.26) 



where the formula for a in (3.1) is understood. Due to permutation symmetry among patterns there is no loss of 
generality in our selecting the first pattern in the definition above. The above definition is obviously independent of 
the replica method. Thi s how ever, can be used to calculate the distribution of stabilities. Replacing Z^ 1 by Z'' 1 ^ 1 in 
t he th ermal average in (3.26), keeping in mind that in the end the n — > limit should be taken, we recognize that 
( 3.26 ) is tech nically a little modified version of the partition function integral. The calculation is in analogy with the 
derivation of ( 3.17oj ), the latter shown in Appendix [b|, and we end up with 



p(A) = «*(A-|ft)» e 



ln=0' 



(3.27) 



where any replica index other than 1 could equally be chosen, 
arbitrary but fixed, can be written in the form 



Thus the average of an arbitrary function U(y a ) 



({U(ya))) s \ n=0 = / dyp(y)U(y), 



(3.28) 
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where A as integration variable was replaced by y. An instructive formula for the mean error is obtained in terms of 
the distribution of neural stabilities as 



((v(yi))) e \ n=0 = / d y p(y) y (y)- 



(3.29) 



The first equality co mes a bout from the definition ( 3.15 ), the energy term ( 3.17 ), and the notation ( 3. IE ), while the 
second follows from (HI) . In case of replica symmetry this expression goes over to the mean error displayed in [[|. 
In fact, (3.28) allows us to use error measures that are not related to the thermodynamic energy. One can take a 
V(y) in the Hamiltonian and define the observable er ror b y another U (y) measure. Thi s wa s don e, w ithout assuming 
replica symmetry, with U(y) = 9(n — y) in jjj. Eq. (3.2E) holds for both constraint s (|3.5|) and (3.6). In the second 
case this is due to the fact that /3 2 can be absorbed into q a b, conseque ntly /3 times ( [3.22c[ ) and ( 3.22d ) bo th b ecome 
/3 independent. Thus only / e (Q) enters the thermodynamical formula ( 3.1 5| ) for the mean error, yielding ( ^29| ). 

The overlaps q a b emerged from the replica theory as auxiliary variables, with no prescription how to measure them. 
In analogy to spin glasses, where the Edwards- Anderson overlap of spins qEA has been defined independently of 
replicas |p9| , one can introduce an Edwards- Anderson overlap of synapses 



\ fc=l / qu 



(3.30) 



If the summation produces a self-averaging quantity, then the quenched average can be omitted from the definition, 
but eventually the same formula holds. Replacing the Z~ 2 that appears in the thermal averages by Z"~ 2 , we ca n 
again apply the replica method. Attaching the a = 1 and a = 2 replica indices to the synaptic couplings in ( 3.30 ), 
and carrying out a calculation analogous to that in the case of the local stability distribution, we have 



<7ea = qi2- 



(3.31) 



This is valid for both the spherical and the independently distributed synapses. The ambiguity in this expression is 
obvious: there was some arbitrariness in selecting the 1st and 2nd replica indices for the the synaptic couplings in 
( |3.30 ) and labe ling the other n — 2 replicas starting from a = 3. In the terminology of replica theory of spin glasses, 
the result (3.31) is in fact the overlap within one pure thermodynamical state |Q. A detailed study of the probability 
distributions of overlaps in multiple thermodynamical states (see p0| on the SK model) for the neuron is beyond our 
present scope. Nevertheless, in the analysis of correlation functions the consequences of ultrametricity as described 
in Ref. |2§fl will be recovered. Actually, for spec ial error measures complex structures can arise in the neuron even 
without spontaneous replica symmetry breaking [214,125]. In summary, based on the manifold of analogies with spin 
glasses that we expound later in the paper, we expect that several aspects of the physical interpretation of the replica 
theory for spin glasses carry over to the storage problem of the neuron, even when in the latter replica symmetry is 
spontaneously broken. 

A moral of this subsection is that the replica method enables us not only the calculation of extensive quantities, 
like the free energy, but also the evaluation of local quantities. Technically, this is due to the fact that replicas are 
useful in taking the average of, besides the logarithm, also inverse powers of the partition function. 
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IV. THE PARISI SOLUTION 



A. Finite replica symmetry breaking 

1. Recursive evaluation of the free energy term 



Below we resolve the "hard" terms in the free energies, namely, expressions (3.1 7d ) and (3.22d). The derivation 
follows the spirit of Parisi's as described concisely for the SK model in Ref. |pcfl . The added aim here is to present 
the Parisi solution in a comprehensive, self-contained manner. Later we will be rewarded for this approach, because 
the calculation of expectation values shall follow straightforwardly. 

Our main concern here is 



1 f d n x d 7l y / v n \ 
tp[$(y), Q] = - hi J exp f J2 a=1 + ixy ~ 2 xC>a 



(4.1) 



Whereas this formula would look simpler, if the Fourier transform of e^^ v ' were used, we ke ep the above nota tion, 
because it is the f unctio n <P(y) that will explicitly appear in the final evaluation. Both Eqs. ( 3.17d ) and ( 3.22d| ) are 
of this type. Eq. ( 3.17d| ) corresponds to 



-Pf e (Q) = ™p[-0V(y),Q], 



and (3.22d) is obtained by 



-pf a (Q) = rup 



In j dx w (x) e-^ x ,Q 



(4.2) 



(4.3) 



Note that there are no a priori bounds for and the diagonal elements q aa vanish. Furthermore, the function w$(x) 
is assumed to cut off sufficiently fast so that (4.3) exists. Later in this paper, however, when an integral expression 
is displayed and we do not discuss divergence that means we assume the conditions of finiteness hold. This does not 
mean that und er o ther conditions the expression could not diverge. 

We will call (4.1) the "free energy term", it is ubiquitous also in long range interaction spin glass models. We just 
have seen that the free e nerg y of the neuron with arbitrary, independently distributed synapses contains two additive 
terms of the type of Eq. (4.1 ). We will see later that the spherical entropic term is also of this type, so the free energy 
of the spherical neuron is the sum of two te rms of the type (4.1). Most of the mathematical parts of this paper are 



centered about the evaluation of expression (4.1 



Due to the absence of inherent topology in infinite range spin glass models, the replica approach led there to a single- 
site effective free energy. For such problems the ansatz of Parisi's turned out to be a very successful mathe matical 
framework, presently the stepping stone to the field theory of the spin glass transition (see references in [|22l[ ) . Since 



the present neuron problem is a priori single-site, it is reasonable to search for the Q minimizing (3.17a) by using 
Parisi's hierarchical assumption, which reads as 



R+l 

Q = E (*■ 

r=0 



q r -i) U, 



'n/m r 



(4.4) 



where the subscript k to a matrix marks that it is k x k, 
furthermore, 



\k is the unit matrix and Ufc has all elements equal unity, 



q-i = 0, q R+ i = q D , 
TfiR+i = 1 < tur < mij_i • • • < mi < too = n, 



where the integer m r is a divisor of m r —x- In the case of the Q of Section III there is a presumed ordering 



q-i = < g < qi ■ ■ ■ < qn < qn+i = 1 



(4.5a) 
(4.5b) 



(4.6) 



In theory, q r < are also possible, but in our numerical explorations of example s su ch q r -s did not appear, so we shall 
consider the restriction to nonnegative q-s part of the ansatz. For Q of Section III the assumption is 



q-i = < q < qi ■ ■ ■ < qR, q~R+i = 0. 



(4.7) 
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These represent the i?-step replica symmetry breaking scheme (i?-RSB). At this stage we do not prescribe the ordering 
of q r -s and allow uniform diag onals q aa = 9d of any magnitude. 
The quadratic form in (4.1) is then 



R+l n/m T 

xQx = {ir - Qr-l) Y 



(4.8) 



r=0 



j r = l \a=m T (j r -l) + l 



The ip[4>(y), Q] of ( |4.l|) should thus be replaced by (p[$(y), q, m)], where the parameters in (4.4) are considered to be 
the elements of the vectors in the argument. By using the notation 



Dz 



dz e 2 



2n 



and the identity 



we obtain 



e - 5 *» = i Dz e 



(4.9) 



(4.10) 



n<p[$(y),q,m] _ 



_R+1 n/m r 

n 

r=0 j r =l 



d n x d"y 
(2tt)" 



n/m r 



] r m r 



exp -i ^/qv - g r -i X! z jJ X! Xa 

V r=0 jV=l a=m r (>-l) + l 

n n \ 

a— 1 a— 1 / 



(4.11) 



Appendix ^| shows that the above expression equals Eq. ( |C6| ) . 

The limit mo = n — > violates the ordering in (4.5b). In fact, experience in spin glasses HQ and in fl-RS B, 
i? = 1,2 calculations in neural networks (see 0>|ll|) suggests that m r -s get less than 1 and the ordering in (4.5b) is 
to be reversed. This can be understood by our introducing 



n — m r 
n — 1 



(4.12) 



for arbitrary n and using the x r -s for parametrization instead of the m r -s. The new parameter x r should not be 
confounded with the integration variable x a in Eq. (4.1). For integer n and m r -s satisfying (4.5b) we have the 
ordering 



xr+i = l>x R > X R -i ■ ■ ■ > xi > x = 0. 



(4.13) 



Keeping the x r -s fixed as n — > defines the n-dcpendcncc of the m r -s, and for n — formally we get x r = m r . Th is 
explain s th e aforementioned practice to treat the m r -s as real numbers in [0, 1] with ordering reversed w. r. t. ( 4.5tQ . 
Eq. (C6) becomes for n — > 0, in terms of the x r -s, 



ip[${y),q,x]\ n=Q 



,Tl 



Dzq In / Dzi 



Dz 2 



(4.14) 



J Bz R+ i exp <P ( Y z r \Jq_r- q r -i 

This is the general formula for i?-RSB. Expression ( |4. 14 ) can be written in form of an iteration for decreasing r-s as 

(4.15a) 
(4.15b) 



Vv-i(y) = J Dz ip r (y + z\fk~r - q r -i) * r+1 , 

xj; R {y) = [Dz e *(-V+^9R+i-9R) : 
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or, we can set xr+2 — 1 and put the initial condition as 



il>R +1 (y) = e*W. 



(4.16) 



In the iterated function we omitted to mark the functional dependence on <P(y) and q,x. If a q r — q r -i < then 
the square root is imaginary. Since the Gauss measure of integrations suppresses odd powers in a Taylor expansion 
of the integrand, the result, if the integrals exist, will be real. The case of non-monotonic q r sequence will be briefly 
discussed in the end of this section. Then 



<p[&(y),q,x]\ n=Q = i Jl)z h.ip (zy/%) 



(4.17) 



Note that an iteration like ( 4.15| ) can be also understood, before the n — ► limit is taken, directly on Eq. (CG) 
where m r /m r+ i is integer. Then formally ip[<P(y),q,x] — n^ 1 ln-0_i(O). Hen ce for n — > we recover ( 4.17 ). It is, 
however, an advantage that we can first take n — ► then define the recursion (4.15) with fractional powers. Indeed, 
while dealing with the consequences of the recursion, the replica limit n — > is implied and we do not have to return 
to the question of that limit again. 

It is instructive to introduce 



x r +i 



lending itself to the recursion 



In / Bze XrVr{y+z ' /qr - q '- l) 



<PR+i(y) = $(y), 



and yielding 



ip[$(y),q,x]\ n=0 = J Bz ip {z^/qoj . 



(4.18) 

(4.19a) 
(4.19b) 

(4.20) 



2. Parisi's PDE 

The above recursions can be viewed as diffusion processes in the presence of "kicks" . Let us introduce here Parisi's 
order parameter function (OPF) as 



x (q) = ^2( x i+i - x i) o(q - «»)> 



i=0 



defined on the interval [0, 1], where (h6) and ( 4.13 ) are understood. With the standard notation 

f(q +Q ) = hm/(g + e), 

e— >0 

we have obviously 

x (<lr ) = x r+l, x (qr°) = x r, 

and we may set 

x(q r ) = x r+1 . 

Next we introduce the field ijj(q,y) such that at q r it has the discontinuity 

^(Qr ^) = ipr(y), 

i>{ q - a ,y) = My)^ ] '• 



(4.21) 

(4.22) 

(4.23) 

(4.24) 

(4.25a) 
(4.25b) 
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In other words, 



(4.26) 



is continuous in q. We may set at the discontinuity 



(4.27) 



A graphic reminder to the way x(q) and ip(q, y) ar e defi ned at the discontinuity is shown on FIG. |T[ Note that r was 
converted t o q differently for x r and if> r , cf. Eqs. ( 4.24 ) and ( 4.27 ). All fields appearing below follow the convention 
(fr.254|D7l)). 



i 
i 

! x{q) = x r+1 




FIG. 1. Schematic behavior of x(q), tp(q,y), and <p(q,y) at a discontinuity point q r . A fixed y is assumed. T he fu nction 
ip(q,y) is continuous in q but has a discontinuous derivative. The two limits of ip(q r ,y) are related through Eq. (4.2E). The 
circles are placed where the function value is not taken as the limit. 



In the interval (q r -i,q r ) we define the ip(q,y) based on (4.15a) as 

ip(Q, V)= jDz ip (q~°, y + zy/q r - q) 



(4.28) 



ensuring that ( [4.25a ) holds for r — ► r — 1. Relation (4.28) says that the ip(q,y) evolves in the open interval from q r 
to q r -i by the linear diffusion equation 



dgip 



(4.29) 



Near the discontinuity of x(q) another differential equation can be derived. Let us differentiate Eq. ( 4.26 ) by q as 

= ( d q ij) - ~i> Imp) . (4.30) 

x V x ) 
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Since ip(q, y) x w is continuous in q while ip(q,y) and x(q) are not, the two singular derivatives on the r. h. s. must 
cancel in leading order. Hence we obtain 



d q ip = —ip hii/> 



(4.31) 



in an infinitezimal neighborhood of q r . The above derivation is apparently unfounded, because at a discontinuity 
the rules of differentiation used in (4.30) loose meaning. However, considering (4.31) at a fixed y as an ordinary 
differential equation separable in q helps us through the discontinuity, and we obtain 



dtp 



4>(q-°,y) V'lnV' 



The integrals yield 



(4.32) 



ln\nip(q,y) 



hix(q) 



(4.33) 



whence b y exp onentiating twice we recover the continuity condition for (4.26). In conclusio n, for a discontinuous x{q) 
equation ( 1.31 ) can i ndee d be i nterp reted as the differential form of the prescrip tion that ( 4.26 ) is continuous in q. 
Concatenation of ( 1.2£ ) and ( 4.31 ) gives, with regard to the initial condition ( |4.16| ), the PDE 



d q tp 



•Ac^ + ^lnV, 



(4.34a) 
(4.34b) 



Indeed, at a q r the x{q) is singular, so the second term on the r. h. s. dominates and we recover (4.31), whereas within 
an interval x{q) = and thus (1.29) hold s. 



The transformation analogous to (4.18) is 



and gives rise to 



d q (p 

<p0-,v) 



±dy<f ~ \X (dylff 



*(!/)■ 



(4.35) 



(4.36a) 
(4.36b) 



It follows that when x(q) has a finite discon tinuity then the field (p{q, y) is continuous in q, as on FIG. |. This is m 
accordance with the condition that formula (1.25b) is continuous. 

The PDE (4.36a) can be rewritten via the transformation q = q(x) to one evolving in x, a PDE first proposed by 

In this paper we refer to (4.36) and its equivalents as 



Parisi with a special initial condition for the SK model 
Parisi's PDE, PPDE for short. 

When x(q) = const., diff erent iation of the PPDE ( fl.36| ) in terms of y gives the Burgers e quation f or the field d y ip. 
Then the derivative of Eq. ( 4.35 ) by y corresponds to the Cole-H opf t ransformation formula 1 222, 223 1. which co nvert s 
the Burgers equation into the PDE for linear diffusion]^], here (4.34) with x = 0. If x(q) is not a constant, (4.35) 
connects two nonlinear PDE-s. We shall refer to ( 4.35| ) as Cole-Hopf transformation 



In the case of a discontinuous initial condition ^(y) the Cole-Hopf transformation (4.35) connects two discontinuous 
functions at q = 1, while generically diffusion smoothens the discontinuity for q < 1. Even if we succeed in defining the 
PDE-s for non-differentiable initi al con ditions, the equivalence of (1.34) and ( 1.36 ) is doubtful. In case of ambiguity 
precedence is taken by the PDE ( 1.34 ), that directly follows from the recursion ( 1.15 ). The question of discontinuity 
in the initial condition will be discussed later. 



Our main focus is the term (1.20), now also a functional of x(q) 



<p[$(y),x(q)] 



Dz ip(q ,Zy/q^), 



(4.37) 



: E. Ott has kindly called our attention to the Cole-Hopf transformation. 
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where n = is implied. Note that x(q) = in th e int erval (0,qo), so (4.36a) becomes the PDE for linear diffusion, 
whose solution at q = is given by the r. h. s. of ( [l37| ), thus 



d*(l/),aj(?)] =¥>(0,0). 



(4.38) 



In the above PDE-s q is a time-like variable evolving from 1 to 0. In the context of the PDE-s we will refer to q 
as time, and ordinary derivative by q will be denoted by a dot. The above PDE-s can be considered as nonlinear 
diffusion equations in reverse time direction. 



Next we study the case of Q with Parisi elements obeying (4.7). Then the PDE obtained for the field ip(q,y) by 
continuation contains the function x(q). We obtain 



9qi> 



■i^ + l^h,,'. 



(4.39a) 
(4.39b) 



where ip(qn,y) is real due to the symmetry of the Dz measure. Alternatively, with the Colc-Hopf transformation 
( 4.35 ), we have 



<p(q R ,y) =ln / Dze^^ 2 ^). 



(4.40a) 
(4.40b) 



The existence of the integral is a sensitive question here, because the imaginary term in argument expresses the fact 
that exp<£ is evolved by backward diffusion. The meaningfulness of the above initial condition should be checked 
case-by-case. Then the sought term is 



¥>[*(!/), Q] 



f[$(y),x{q)] = 0(0,0). 



(4.41) 



In contrast to the PDE-s associated with the matrix Q of naturally bounded elements, where the time span of the 
evolution is the unit interval, in the case of Q the PDE-s' evolution interval is not fixed a priori. Now q goes from qu 
to 0, where qn itself is a thermodynamical variable subject to extremization. 

Finally we emphasize that the recursive technique may be able to treat non-monot onic q r sequences. Indeed, if 
q r < q r —i then an imaginary term would multiply z in the integrand on the r. h. s. of (4.15a), but the 1. h. s. would 
have a real function. If the integrals involved exist then there is no obstacle to extend the theory to non-monotonic 
q r -s. Such a case did not, however, arise in our explorations. As we shall see in Section VIA, the OPF x(q) is a 
probability measure, a property that non-monotonicity would contradict. On the other hand, Q can be considered 
as associated with a non-monotonic q r sequence. Its diagonals vanish, q aa = qn+i — 0, and so the step from qn 
to qn+1 = goes against the trend of the otherwise supposed ly mo notonic increasing q r sequence, r = 0, . . . , R. 
Accordingly, an imaginary factor of z appears on the r. h. s. of (4.40b), and the recursion is as meaningful as it was 
in the case of a monotonic q r sequence. 

The generalization of the picture above is straightforward to an order parameter with more components, when the 
structure of the free energy term remains essentially the same. We briefly discuss this case in Appendix [E|. 



B. Finite and continuous replica symmetry breaking 

1. The continuous limit 



If the minimum of the free energy is found at an OPF given in (4.21) with R = oo, then the q r -s accumulate 
infinitezimally densely in some region. If this happens in an interval, the OPF x(q) is expected to increase there 
strictly m onoto nically, given its physical interpretation as mean probability distribu tion of the overlaps, as discussed 
in Section VIA. Within that interval the recu rsion s go over to the PDE-s of Section IV A 2 , which can be obtained in 
this case in the spirit of the approach of Ref. [224], as discus sed in Appendix In other regions in q, where the x(q) 
remains a step function, the recursions discussed in Section IV A 1 can be used, but the PDE-s are also still valid, as 
described in Section IV A 2. In either case, the PDE-s are applicable independently of whether the minimizing OPF 



is continuous or step-like. 
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In physical systems so far, including spin glass and neural network models, out of finite R-s only R — and R = 1 
RSB phases were found thermodynamically stable. The significance of 2 < R < oo RSB seems to be in approximating 
the R = oo case. Generically, both finite and infinite R states are characterized by the border values 



9(0) 



90 

liroR- 



. _ . 9fl 
Q M - i lim, 



go 



if 
if 

if 
if 



R < oo 
R — oo 

R < oo 
R = oo 



(4.42a) 



(4.42b) 



where c^q) > an( i 9(1) < !• These are delimiters of the trivial plateaus of the OPF x(q) as 

x(q) = 



(4.43) 



i if < g < g (0 ) 
if 1 > q > 9(1) 

The border values ( 4.42| ) apply to both the finite and infinite R cases, the difference remaining in the shape of the 
OPF x(q) within the interval (<7(o), 9(i))- Here we assumed qn+i — qu — 1, this makes the q = 1 value special and we 
will use that in the general discussion. 

When R = oo a typical situation is when extremization of the free energy yields 





■<'(</) = 4 *c(ff), 



£(0) < x c(q) < < i c (<?) < 



if 0<(7<q (0 ), 
if 9(o) < 9 < 9(1)) 
if 9(i) < 9 < !■ 



(4.44) 



Here 



In words, the OPF has a strictly increasing, continuous segment a; c (q) between the border values (4.42) 

Z(i) = 

The case with an OPF having a smooth, strictly increasing, segment x c (q) will be referred to as continuou s RSB 
(CRSB). Obviously, CRSB always implies R — * oo. In principle, then the OPF may be more complicated than ( 4.44 ), 
e. <?., there may be nontrivial (x ^ 0, 1) plateaus and several x c {q) segments separated by them. So far, however, no 
system was found whose replica solution involved more than one strictly increasing segments x c (q), separated by a 
plateau or a discontinuity. 

In what follows we will use the term "continuation" , when we understand the n — > limit, the usage of x(q) based 
on Eqs. (4.12, 1.21] ), as well as we give allowance for but do not necessarily imply CRSB. 

If the OPF in question is x(q), defined analogously to (4.21) with the parameters {q r ,x r }, continuation goes along 
similar lines. 



2. Derivatives of Parisi 's PDE 



The iterations derived in Section IV A 1 only describe finite i?-RSB, including the R = replica symmetric case, 
while the PDE-s incorporate both finite and continuous RSB. We therefore stud y the PDE-s. 

For later purposes it is worth summarizing some PDE-s related to the PPDE ( 4.36] ) and its Cole-Hopf transformed 
Eq. (J04J). The field 

M?>2/) = 9 y (p(q,y), (4.45) 

satisfies the PDE 

dqfi = -\dyii - x[id v ]i, 

obtained from the PPDE by differentiation in terms of y. One more differentiation introduces 

K(q,y) = d y fi(q,y), 

which evolves according to 



(4.46a) 
(4.46b) 

(4.47) 

x (k 2 + (id y n) , (4.48a) 

(4.48b) 

Note t hat w hile the PPDE ( 4.36 ) and Eq. ( 4.46 ) are self contained equations, in principle solvable for the respective 
fields, (4.48) is not such and should rather be considered as a relation between the fields n(q,y) and K(q,y). 

The Cole-Hopf transformation for the first derivative /i(q, y) can be conveniently defined as the field fi(q, y)ip(q, y). 
This can be further differentiated to produce the Cole-Hopf transformed field for n(q,y). The PDE-s for the trans- 
formed fields each reduce to the linear diffusion equation along plateaus of x(q). 



d q K = 



2"y 

$"{y) 
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3. Linearized PDE-s and their adjoints 



As we shall see, in the calculation of expectation values linear PDE-s associated with the above equations play an 
important role. A perturbation ip(q,y) + e&(q,y), around a known solution (p(q,y) of the PPDE, itself satisfies the 
PPDE to 0(e) if 



This equation is satisfied by n(q,y) with initial condition (4.46b). The field 
then evolves according to 

d q r] = -%6$T] - xdy((j,ij), 



(4.49) 
(4.50) 
(4.51) 



obviously satisfied by n(q, y) if the initial condition is specified by ( 4.48b ). 

The field P(q,y) adjoint to d(q,y) and crucial in the computation of expectation values can be introduced by the 
requirement that 



dyP(q,y)ti(q,y) 



(4.52) 



be independent of q. Differentiating by q, using Eq. (4.49), and partially integrating with the assumption that P(q,y) 
decays sufficiently fast for large \y\, we wind up with the PDE 



d q P=\d 2 y P-Xdy{ j lP). 



(4.53) 



Here the time q evolves in forward direction, from to 1. The equivalent of the field P{q, y), evolving from the initial 
condition in our notation 



P(0,y) = S(y), 



(4.54) 



was introduced by Sompolinsky in a dynamical context for the SK model |53| . In this case the average ( |4.52| ) assumes 
the alternative forms 



dyP(q,y)#(q,y) = / dyP(l, y) 0(1, y) = 0(0, 0). 



(4.55) 



Eq. ( 4.53 ) is in fact a Fokker-Planck equation with x(q) n(q,y) as drift. The initial condition ( 4.54 ) is normalized to 
1 and localized to the origin. Hence follows the conservation of the norm 



J dyP(q,y) = l, 



(4.56) 



and the non-negativity of the field P(q,y). Thus P{g, y) can be interpreted as a g-time-dependent probability density. 
We will refer to the initial value problem (4.53, [4.54| ), which determines Sompolinsky's probability field P(q,y), as 
Sompolinsky's PDE (SPDE) hereafter. 

Analogously, the field S(q,y) adjoint to Tj(q,y) satisfies 



d q S = \dyS — x/idyS, 



that renders 



/ 



dyS(q,y)r](q,y) 



(4.57) 



(4.58) 



constant in q. Obviously d y S satisfies the SPDE ( 4.53| ). 

The Cole-Hopf transformation can be extended to i?(g,y). This is do ne by the recipe that in the intervals with 
x = the new field exhibits pure diffusion. Suppose that ip{q, y) satisfies (4.34), then let 



v{q,y) = -d(q,y)il}(q,y), 



(4.59) 
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whence 



d q V 



(4.60) 



Similarly, the analog of the Cole-Hopf transformation for the field P(q,y) adjoint to y) is 

T{q,y) = P{q,y)/ip(q,y), 

satisfying 



d q T= \dlT- -Tlni>. 
- " x 



(4.61) 



(4.62) 



If x = then the PDE-s ( 4.6C , 4.62 ) indeed reduce to the equation for pure diffusion. Based on that the -d and P fields 
can be evaluated along plateaus of x(q) straightforwardly. 



4- Green functions 



The PDE-s previously considered were of the form 



9 q X(q, y) = C{q, y, d y ) X(q, y) + h(q, y), 



(4.63) 



where the unknown field is X(q,y) and the time q may evolve in either increasing or decreasing direction. The 
differential operator C(q, y, d y ) is possibly nonlinear in X , may be q- and y-dependent, and contains partial derivatives 
by y. For vanishing argument X = the operator gives zero, £0 = 0. We included the additive term h(q,y) for the 
sake of generality, it was absent from the PDE-s we encountered so far. 

In what follows we shall introduce Green func tions (GF-s) for linear as well as nonlinear PDE-s. Suppose that 
X(q,y) is the unique solution of a PDE like ( 4.63 ) with some initial condition. The GF associated with the PDE for 
the field X(q,y) is defined as 



Gx(qi,yi;q2,y2) 



SX(q uyi ) 
SX(q 2 ,y 2 )' 



(4.64) 



This may be viewed as the response of the solution X at qi to an infinitezimal change of the initial condition at q 2 . 
The above definition yields a retarded GF, that is, if the PDE for X evolves towards increasing (decreasing) q then 
the GF vanishes for qi < q 2 (qi > q 2 ). Obviously 



Gx(q,yi;q,y2) = <%i - 2/2)- 

The chain rule for the functional derivative in ( |4.64[ ) can be expressed as 



Gx{qi,yi\q2,V2) 



dy Gx {qi , yi ; q, v) Gx (q, y; <?2, 2/2) 



(4.65) 



(4.66) 



where q is in the inter val d elimited by q\ and q 2 . This is just the customary composition rule for GF-s. In terms of 
the adjoint property, ( |4.66| ) means that the adjoint field to the GF in its fore variables is the same GF in its hind 
variables. The P DE-s the GF satisfies in its fore and hind variables are, t herefore, each other's adjoint equations. 

The definition ( 4.64 ) applies both to linear and nonlinear PDE-s ( 4.63 ). It is the specialty of the linear PDE that 
Gx{q\i Vi] <?2j 2/2) satisfies the same PDE in t he va riables qi,yi with additive term h{q,y) — ±6{q% — q 2 )S(yi — y 2 ), 
where the sign is + if the time q in the PDE ( 4.63 ) increases and — if it decreases. Then the solution can be given in 
terms of the GF-s in the usual form 



X{qx,y\) = I dy 2 Gx{qi,yi\ Q2, y 2 )X(q 2 , y 2 ) + / dy 2 / dqGx{qi, yi] q, J/2) h{q, y 2 ) 



in 



(4.67) 



'12 



If the PDE for X is nonlinear then Gx{qi, Vi] 92, J/2) is the GF for the PDE that is obtained from the aforementioned 
PDE by linearization as performed at the beginning of Section IV B 3. In short, the GF of a nonlinear PDE is the GF 
of its linearized version. Note that for a nonlinear PDE the GF is associated with a solution X of it, for that solution 
usually enters some coefficients in the linearized PDE the GF satisfies. 
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Suppose now that the differential operator in (4.63) is C(q,d y ), i. e., it is translation invariant in y. Such is the 
case for the PPDE ( 4.36 ) and its derivatives. Then it is easy to see that 



Y(q,y) = d y X(q,y) 
will obey the PDE that is the linearization of the PDE for X. Therefore 



Y{qi,yi)= f dy 2 Gx(qi, 2/i;?2, V2)Y{q 2 , y 2 ) + f dy 2 f dqGx(qi,yv,q,y2) d y2 h(q,y 2 ). 

J J J g 2 



(4.68) 



(4.69) 



If the PDE for X is nonlinear then Eq. (4.67) does not but Eq. (4.69) does hold. The latter, however, is merely 
an identity and should not be considered as the solution producing Y from an initial condition, because in order to 
calculate Gx the knowledge of X and thus that of Y is necessary. 

A prominent role will be played by the GF for the field <p(q,y) from the PPDE ( 4.36 ), that is, by G<p(qi,yi',q2,V2)- 
This GF was introduced and studied in Refs. p7pS| ]. Our first observation here is based on the fact that the 
linea rizat ion of the PPDE yielded Eq. (4.4E) and the linearization of the derivative of the PPDE, Eq. ( |4.46| ), produced 
Eq. ( 4.51 ). Therefore the respective GF-s are identical, 



G<p(qi,yi;q2,V2) = G#(qi,vi;q2,V2), 
Gn(qi,yi;q2,V2) = G v (qi, vi; 92, yz). 



Given the initial condition ( 4.54 ) of the SPDE, its solution is 

P(q,y) = Gp(q,y,0,Q) 



(4.70) 
(4.71) 



(4.72) 



The GF-s Gp and G v were discussed for the SK model in Ref. |29J. Considering the constancy of ( 4.52 ) and ( 4.58 ) 
we have 



G v >(qi,yv,q2,y2) = Qp(q2,V2;qi,vi), 
G M (qi,yi',q2,y2) = Gs{q2,y2\qi,yi)- 



An identity between derivatives of GF-s can be obtained from Eqs. ( 4.50 ) and ( 4.67 ) as 

d y2 G v (q 2 ,y2;qi,yi) = -d yi G^(q2,y2;qi,yi)- 



(4.73) 
(4.74) 



(4.75) 



Because of their central significance, we display the equations the GF of the field tp satisfies. In its fore set of arguments 
the Gip(qi,yi;q2,y2) satisfies 



d qi G v = -\d 2 yi G^ ~ Vi) dyiG v ~ S(qi ~ 92)^(2/1 



2/2) 



(4.76) 



where the di fferen tial operator on the r . h. s. is the same as on the r. h. s. of ( 4.49| ). In the hind set, with regard to 
the identity ( ^4.73|) and the SPDE (t4.53|) , we obtain a PDE 



d q2 G v = 2-d y2 G v - x(q 2 )d y2 {n{q 2 ,y 2 )G v ) + S(q t - q 2 )S(y 1 - y 2 ) 



(4.77) 



whose r. h. s. contains the same differential operator as on r. h. s. of the SPDE. The norm in the second y 2 argument 
is conserved as 



/ 



G v {qi,yi,q 2 ,y2)dy 2 = 1 



(4.78) 



for qi < q 2 . 

Equation ( [4.67 ) shows how a particular solution of the linear PDE with a source can be expressed by means of the 
GF. For example, suppose that the source field h(q,y) is added to the linearized PPDE as 



d q V = 



■^d%& - xfidyd + h (4.79) 

and an initial condition #((71, y) is set for some < q\ < 1. Then we have the solution for < q < q\ in the form 

r'/i 



dyi G ¥ {q,y,qi,yi)'&{qi,yi) 



dq 2 I dy 2 G v {q,y;q2,y2)Hq 2 ,y 2 )- 



(4.80) 
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The derivative field (4.45) satisfies (4.46), thus it also satisfies the above PDE (4.79) with zero source, whence 

fi(q, y) = [ d yi g v (q, y; 1, y x ) &{y x ). (4.84) 



Derivation of \i gives k as from (4.47) which satisfies the PDE ( 4.48| ). Its solution can be expressed in terms of the 
GF associated to /i as 



n(q,y)= / dy 1 g^{q,y]l 1 y 1 )^" ( yi ). 



(4.82) 



Note that the relation (4.75) is necessary to maintain (4.47). 

So far we considered the GF-s of ip an d its derivative fields. It is also instructive to see t heir r elation to the GF of 
the field ip. Starting from the definition (4.64) of the GF and using the Cole-Hopf formula (4.35) we get 



y^{qi,yv,q2,y2) = mf „^jj„_ „, ^ y(gi.yi; 92,3/2)- 



x(q2)i>(q2,y2) 

From the PDE-s (|j§,|j^) for G v we have for G^(qx, y x \ q 2 , V2) 



x{qi) 



x(qi) 

(;,,.,; j(gl) 

x(q%) 



(lnip(qi,yi) + 1) Q$ - S(qi - q 2 )S(yi - y 2 ), 



(111^(91,2/1) + 1) Sip + %i - <72)%i - 2/2)- 



(4.83) 

(4.84a) 
(4.84b) 



Equation ( 4.84a| ) could also be obtained by linearization of the PDE (4.34a), while (4.84b) is its adjoint. These PDE-s 
are particularly useful if x(q) — 0, because then they reduce to pure diffusion. One can view relation (4.83) as the 
translation of the Cole-Hopf transformation (4.35) onto the GF-s. We again see the advantage of keeping track of a 
Cole-Hopf transformed pair like Gib and Gip, because Gip is simple for plateaus in x(q) and Q v is useful when x(q) > 0, 
especially at jumps. 

Notation in subsequent sections can be shortened by the introduction of what we shall call vertex functions 



dy G v (qi,yi; q, y)G v (q, 2/; g2, 2/2) G v {q, y\ <?3, 2/3), 



2/JL1) = dyG v (qi,yi;q,y)G fl (q,y;q2,y2)G fl (q,y;q3,y3)- 



(4.85) 
(4.86) 



The ordering q\ < q < q 2 , q\ < q < q% is understood. The vertex functions satisfy the appropriate linear PDE in each 
pair qi,yi, furthermore, if q coincides with say qj then the vertex functions reduce to the product of the other two 
GF-s with qi,i 7^ j. 

As shown for q x < q < q 2 , q\ < q < 93 in Appendix [|], we have the useful identity 



(4.87) 



A notable consequence of that is obtained from the fact that /i(</,y) ofEq. ( 4.45| ) and K,( q, y) of Eq. ( 4.48 )) are evolved 
by Gip and Q^, respectively, as it follows from Eq. (4.69). Therefore, multiplication of (4.87) by the initial conditions 
^'(2/i) = /•*(!) 2/i), f° r * = 2, 3, and integration by those yi-s gives for q x < q 



d q / dyG v (qi,yi,q,y) n(q,yf = / dy G v (qi, 2/1; q, y) k(<?, y) 



(4.88) 

The mathematical properties of the PDE-s will acquire physical meaning in subsequent chapters where thermody- 
namical properties are studied. 



5. Evolution along plateaus 

Here we collect the few obvious formulas describing the evolution of some fields along the trivial x = and x — 1 
plateaus, and give the GF for ip for any plateau. 
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Let us consider firstly the x = plateau, i. e., the region < q < Q(q). We recall the Cole-Hopf formula ( 4.35| ) for 
the field ip(q,y) to obtain 



il>(q,y) = 1- 



The field <p(q,y) obeys the PPDE ( 4.36 ), thus is purely diffusive for x — as 



(4.89) 



(4.90) 



Due to continuity of if in q this also holds for q = qrQ\. The probability field P(q, y) from the SPDE ( 4.53 , 4.54 ) is the 
Gaussian function 



P(q,y) = G(y,q), 



where the notation 



G(x,a) = 



2?T(7 



' h) 



(4.91) 



(4.92) 



was used. 

In the region qm < q < 1 is the x — 1 plateau, we have 



ip{q,y) 



J Dz exp$(y + zy/T~q), 



(4.93a) 
(4.93b) 



The time-dependent probability field P(q,y) is best evalu ated along plateaus by its own version, ( 4.61 ), of the Cole- 
Hopf transformation. The transformed field T(q, y) obeys (4.62), so it red uces to pure diffusion along a plateau. Thus, 
assuming the knowledge of P (<?(i),y) and having the f(q,y) from ( 4.93 ) we get 

P(q,y)=e^ [ Dz e M^v+'y/^)p y + z ^= ^) . 



(4.94) 

The GF for the field ip, Q v , will be given on any plateau. Suppose that x(q) = in the closed interval [qi, q 2 ]. Then 
from ( 4.84] ) , for a positive plateau value x, is a Gaussian function. Then Q v becomes from (4.83) 



G v {qi,yr,q2, y 2 ) = e ^te^)^ta^)) G{y2 _ yuq2 _ 



(4.95) 



where the notation ( 4.92 ) has been used. The GF remains to be determined on the trivial plateau x = 0, that is 
obtained from say (4.76) as 



G 9 {qi,yi;q2, 2/2) = G(y 2 - yi,q 2 - qi)- 



(4.96) 



This is the same as we would get from (4.95) by substituting x = 0. 



6. Discontinuous initial conditions 



If the initial condition ^(y) of the PPDE ( 4.36 ) is discontinuous then special care is necessary near q — 1. While 
strictly speaking the PPDE is defined only for initial conditions twice differentiable by y, one may expect that for 
practical purposes a much less strict condition suffices. For instance, in the textbook example of pure diffusion any 
function whose convolution with the Gaussian GF gives a finite result, can be accepted as initial condition irrespective 
of its differentiability. The physical picture is that diffusion smoothens steps and spikes and brings the solution into 
a differentiable form within an infinitezimal amount of time. 

The problem with the PPDE for discontinuous initial condition l ays d eeper. It ca n be traced back to the fact that 
the Cole-Hopf transformation no longer connects the two PDE-s ( 4.34 ) and ( 4.36 ). Even if by means of the Dirac 



delta we accept differentiation through a discontinuity, the derivatives of tp(l,y) = exp<P(y) and ip(l,y) — ^{y) are 
not related by the chain rule, namely 



(4.97) 
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This can be seen easily by taking for example the step function 

<%) = £*%). 

Then 

e^) = l + ( e Q -l)%), 



and the inequality (4.97) now takes the form 

aS(y) [l + (e Q -l)%)]^(e Q -l)%). 



(4.98) 
(4.99) 
(4.100) 



Equality could only be r estor ed if 9(y — 0) were c hosen a-dependent, an artifact we do not accept. However, the 
derivation of the PPDE (4.36) from the PDE (4.34) is invalid if the chain r ule ca nnot be applied. 

The difficulty can be circumvented by our using the explicit expressions ( 4.93 ) for the fields ip(q,y), <p{q,y) in the 
interval [qm , 1]. Obviously, even if there is a dis continuity - a finite step - in ^(y), the ip(q, y) and thus tp(q, y) will 
become smooth for q < 1. For instance for ( 4.98 ), using the notation 



H(x) 



1 — erf 



we have 



iP(q,y) = e a + (l~e a )H(y/^T^j for q (1) <q<l. 



(4.101) 



(4.102) 



This is an analytic function for g ^ 1 and becomes ([4.99] ) for q — > 1. Then ip(q, y) is obtained in [qm, 1] by ( 4.93b ), 
also analytic for q ^ 1, and ip(l,y) becomes indeed ( 4.98) ) . The above formulas extend down to qm- Interestingly, as 
we shall see later, in the limit of the ground state T — * 0, we have qm — * 1, but the discontinuity of the fields equally 
disappears at qm, although analyticity will not hold. 

Th us we have the fields for q < 1, the only problem remains that we cannot say that ip(q,y) satisfies the PPDE 
( 4.36 ) at q = 1, because of the inequality ( 4.97 ). 

The difference in nature between the ip and ip functions for qm < q < 1 can be illustrated by the following. The 
singularity of the PDE-s can be tamed by our considering the fields as integral kernels. Let us take an analytic 
function a(y) such that itself and its derivatives decay sufficiently fast for large arguments and consider 



Aj,(q) = / Aya(y)ip(q,y). 



(4.103) 



Starting from (4.93a), changing the integration variable as y — > y — Z\f\ — q, and formally expanding in terms of 

(l-q) k 



\J\ — q we get 



Ml) =X/ 



fc=0 



2 k kl 



dyaM(y)e 



*(») 



where we used J Dz z 2k+1 = 0, j Dz ; 



,2k 



(2k — 1)!!, and the notation 
ax K 



On the other hand, a similar procedure can be carried out for 

A v (q) = I dya(y)ip(q,y), 



a case we illustrate on (4.98). From (4.102) we have 

<p(q,y)=ln[e a + (l-e a )H(y/^T^j] =<p(i 
where the last equality defines the single-argument function ip(z). Then 



'1-q 



A v (q) = A v (l) + j dya(y) (y (jj/^T~q) - «%)) 



(4.104) 



(4.105) 



(4.106) 



(4.107) 



(4.108) 
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where A v (l) was added to and subtracted from the r. h. s. . Changing the integration variable as y — > y\/l — q and 
formally expanding by ^/l — q we get 



Ap(q) = 4,(1) + 



fe=0 



al fc l(0) / dyy ft My)-o%)) 



(4.109) 



Thus in leading order we have from (4.104,4.109) 



A^(q) - A^(l) ocl-q 
A v {q)-A v {l) cx y/l-q. 



(4.110a) 
(4.110b) 



So, considering the fields as integral operators in the case of non-differentiable initial conditions, we see from Eq. 
(4.110) that ip does but ip does not have a finite derivative by q at q = 1. This explains why we could maintain the 
PDE for ip while the PPDE had to be given up in q = 1 with a non-differentiable initial condition. 

If the PPDE Q4.36D is ill defined for q = 1 then so may be the PD E-s for the derivative fields, the linearized PDE-s, 



and the PDE-s for the GF-s, as discussed in sections IV B 2) , IV B 3 
the derivative field /z(g, y) as 



and 



IV B 4 . We settle the ambiguity by redefining 



H{<1, V) x(q) ip(q, y) = d y ip(q, y), 



(4.111) 



[?(!), 1], where x(q) 



(q,y)i>(q,y)= ( Dzd y e*(* + *^). 



(4.112) 



For a smooth <P(y ) one recovers the origi nal defi nition ( 4.45 ) for any q. If, h owev er, <£(y) is discontinuous the n, due 
to the inequality (4.97), the new formula (4.112) will, in general, differ from (4.45) at q = 1. The n(q,y) from ( 4.112| ) 
satisfies in [qm, 1] 



8gfi 



(4.113a) 
(4.113b) 



The specialty here is that the derivation of ( 4 . 1 1 3| ) could be done without the now invalid chain rule. The above PDE 
coincides with ( [1.46a ) at x = 1, with an initial condition that may be different from ( 4.46b ^. 



In a similar spirit it can be shown that the fj,(q, y) rede fined above enters the PDE-s 



4.76.4.77) for the GF Q t 



provided the latter is introduced by o ur first giving via ( 4.64 ) then defining Q v via ( 4.85 ). Note that the GF Q v is 
given in the interval [qm 
than 1. 



1] by ( p~95| ) with x = 1, a smooth function in the y-arguments if both q arguments are less 



The continuous framework, with PDE-s, was meant to be a practical reformulation of the iteration (4.15). Real use 
of it is in the R — > oo limit, when it allows more liberty in parametrization of a finite approximation than just the 
taking of a large but finite R. In case of ambiguity, however, the iteration takes precedence. That argument helped 
us to refine our formalism of PDE-s for discontinuous initial conditions. 

In what follows we will use the short notation made possible by the PDE formalism as if we were dealing with a 
continuous initial condition <P(y). However, if <P(y) is discontinuous then the PPDE must not be applied at q = 1, 
rather (4.92) yields the field ip(q,y) in [gm, 1]. So although then the PPDE is not true at q = 1, we keep it and 
understand it as the above recipe. The derivative of the PPDE can be upheld with the above definition of the 
derivative field fj, as can the PDE-s for the GF Q v . In concrete computations on a discontinuous initial condition we 
shall see that this takes care of most of the problem. 
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V. CORRELATIONS AND THERMODYNAMIC AL STABILITY 



A. Expectation values 



1. Replica averages 



In this section we evaluate important special cases of the generalized averages ( p.l9| ) and (3.23) within Parisi's 
ansatz. In what follows, generically the knowledge of Q, or equivalently, in the n — > limit, that of x(q) will be 
assumed. Practically, all fields introduced above as solutions of various PDE-s, for given x{q), will be considered as 
known and expectation values expressed in terms of those fields. The pioneering works in this subject are that of de 
Almeida and Lage |^7| and of Mezard and Virasoro |2Sj], who evaluated the average magnetization and its low-order 
moments in the SK model. What follows in Section |V A| can be viewed as the generalization of the mechanism these 
authors uncovered. 

"local field" 



We shall call the variable y in (4.1) "local field". In the SK model y corresponds to the local magnetic field, for the 
neuron it is the local stability para mete r, and it is useful have a name for it even in the present framework. 
The generic formula comprising (|3~T9|) and ( |3~23| ) is 



U(x,y))) = 



f d n x (Ty .. . /v^™ ^/ x i ^ 

J (27r)" ^' V ' ^ V^a=l ( Va > + lXV ~ 2 XQX 



(5.1) 



The normalizing coefficient, analogous to the prefactors in Eqs. (3.19) and (3.23), is not included here, since in the 
limit n — > it becomes unity. We shall automatically disregard such factors henceforth, furthermore, we will take 
n — > silently w hene ver appropriate. Dependence on and Q is not marked on the 1. h. s. . 

The quantity ( |5.l| ) will be called the replica average of the function A(x,y). Such formulas emerge in most cases 
when we set out to evaluate thermodynamical quantities in or near equilibrium. 



2. Average of a function of a single local field 



A case of import is when the quantity to be averag ed de pends only on th e loca l field y a of a single replica. Such is 
the form of the distribution of stabilities given in Eq. ( 3.27 ) and the energy ( 3.2E ). Due to the fact that y a and x a are 
each other's Fourier transformed variables, the expectation values of replicated x-s, like in Eqs. (3.21,3.24,3.25) are 
related to the averages of products of functions of local fields y a s. The latter can be straightforwardly understood 
once the case of a function of one y a argument is clarified. Thus we firstly focus on 



C A = ((A( yi ))) 



(5.2) 



There is no loss of generality in cho osing the first replica, a = 1, because RSB only affect s groups of two or more 
replicas. Within Parisi's ansatz (4.4) the Ca evaluates to a formula like the r. h. s. of (4.11) with the difference that 
here A(yi) is inserted into the integrand. In analogy with (p2|) we obtain 



C A 



r+1 n/m r 

n n° 

r=0 j r =l 



(5.3) 



sr=0 



a=l 



sr=0 



We used the definition of j r (a) from Eq. (CI). In t he argu ment of A the j r {\) = 1 label was inserted for the z^ r '-s . 
After a reasoning similar to that followed in Section IV A 1 again expressing the integer m r by the real x r from ( 4.12| ) 
and taking n — > 0, we arrive at the recursion 



ti r -i(y) ipr-i(y) = j T)z d r (y + z \/ q r - q r -i) (y + z^q r - q r -i) a=r+1 , 
§ R+1 {y) = A{y), 



while the iteration of tp r (,y) is defined by Eqs. ( 4.15 , 4.16 ). The final average is obtained at r = as 



Ca — jDz 0„ («V«o) ■ 



(5.4a) 
(5.4b) 

(5.5) 
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Using the identity (Dl) we are lead to the operator form 

4(9,-9,-0-4 



whence by continuation it is easy to derive the PDE 



d q (#V>) = -\d- {'dip) + -fopkuj). 



(5.6) 



(5.7) 



In the spirit of Section |lV A 2 it is straightforward to show that this equation holds also for finite i?-RSB as well. Then 
at discontinuities of x(q) the singular second term on the r. h. s. is absor bed by the requirement that ipjq, y) x / x ^ i s 
continuous in q. The initial condition for ip(q,y) was previously given in ( H-34b|) and that for d(q,y) is set by (|5.4b|) 

as 



6(l,y) = A(y). 



(5.8) 



In Eq. (5.7) we recognize the PDE |i~60l ) for the field ( pL59|) . Now we again have a product like (| l.59| ), so the field 
$(q,y) here also satisfies the PDE (4.49). Thus the sought average (5.5) can be written as 



C A = / Dzi? (g (0 ), Zy/qjfy) , 



(5.9) 



a functional of <P(y) and x(q), where the definition of g(o) by ( 4.42a ) was used. A practical e xpres sion for the above 
average involves the adjoint field P(q, y), obeyin g the PDE ( |4. 53[) and rendering th e form ula ( 4.52| ) independent of q. 
Let us recall the abbreviation for the Gaussian ( 4.92| ), then ( |5.9] ) is of the form of (4.52) at q — q/ \ if 



P(Q(o),y) = G(y,q {0) ). 



(5.10) 



Given the purely diffusive evolut ion in the interval (0, 9(0)), this condition means that P(0,y) is lo calize d at y = 0, 
i. e., P(q,y) satisfies the SPDE (4.53.4.54), whence we can write the expectation value in the form (4.55) as 



C A 



dyP(l,y)A(y). 



(5.11) 



This is the main result of this section. Here the initial condition (5.8) was used, which is just the function we intended 
to average. This expression reveals that P(l,y) is the probability distribution of the quantity y, or, for a general q, 
P(q,y) is the distribution at an intermediate stage of ev olution. 

Note that in |18] we gave a shorter derivation for ( 5.11 ) , which avoided the use of the recursion (5.4). The reason for 
our going the longer way here is that it straightforwardly generalizes to the case of higher order correlation functions. 



3. Correlations of functions of local fields 

The expectation value of a product of functions each depending on a single local field variable reads as 

C AB ... z (a, b,...,z)= ((A(y a ) B{y b ) . . . Z(y z ))) (5.12) 

This will be called replica correlation function, or correlator, of the functions A, B, ... Z of respective local fields 
y a , yj,,...y z . Its "order" is the number of different local fields it contains. The natural generalization of the ob- 
servations in the previous section allows us to construct formulas for the above correlation function. This will be 
undertaken in the present and the following two sections. 
Let us first consider the second order local field correlator 

C AB (a,b) = ((A(y a )B(y b ))). (5.13) 

The Parisi ansatz allows us to parametrize Cab by the q variable, rather than the replica indices a and 6, remnants 
of th e n x n matrix character of Q. This goes as follows. Fixing the replica indices a and b we obtain two iterations 
like ( |5.4| ), with respective initial conditions A{y) and B(y) at q = 1. The iterated functions we denote by & A and 
respectively. The iterations evolve until they reach an index r(a, b) specified by the prope rty that for r < r(a, 6)-s, all j r 



indices coincide, j r (a) = j r (b). Here we used the definition of the labels j r {o) from Eq. (CI), i. e., if j, = 1, . . . ,n/m 
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are the labels of "boxes" of replicas that contain m r replicas then j r (a) is the "serial number" of the box containing 
the a-th replica. The r(a, b) marks the largest r index for which the replicas a and b fall into the same box. Obviously, 
since for decreasing r the box size m r increases, for any given r < r(a,b) the said replicas will fall into the same box 
of size m r . The r(a, b) will be referred to hereafter as merger index, and is a given function of a and b for a given set 
of m r -s of Eq. (4.5b). 

The hierarchical organization of the replicas implies the following property. Consider three different replica indices 
a, 6, and c, then either all three merger indices coincide as r(a, b) — r(a, c) = r(b, c), or two merger index coincide and 
the third one is smaller, e. g., r(a,c) = r(b,c) > r(a, b). This is characteristic for tree-like structures, for example, a 
maternal genealogical scheme. 



The merger index allows us to relabel the matrix elements (4.6) in the Parisi ansatz as 



Qr(a,b) — Qab- 



(5.14) 



This we can consider as the definition of r(a, b), provided that giving q r uniquely determines r, that is, in ( |4.q ) strict 
inequalities hold. At the juncture r = r(a, b) the two aforementioned iterations, so far each obeying ( 5.4a ), merge into 
one, such that the product of the two "incoming" *&a an d $b fields at r = r(a,b) giv e the initial condition for the one 
"outgoing" iteration, denoted by i?ab- That is, for r < r(a,b), again the iteration ( 5.4a| ) is to be used for $AB,r{y) 
such that at r = r(a, b) it satisfies the initial condition 



&AB,T(a,b){v) = &A,r(a,b)(y) ^ B,r(a : b){y)- 



(5.15) 



Such merging of & fields to produce an initial condition for further evolution will turn out to be ubiquitous whenever 
correlators are computed. After changing from the discrete r index to the q time variable, we obtain the expectation 
value in a form similar to (5.9) as 



CAB(q r (aM)) = J Dz -&AB (#(0) , 2^0(0)) ■ ( 5 - 16 ) 

Here we switched notation and denote the dependence on the initial a, b replica indices through q r ( a ,b)- Equivalently, 
replacing g r (a,6) by q, we get 



C AB {q)= / dyP(q,y)tf A B(q,y)= / dy P(q,y) l & A (q,y)'&B(q,y) 



(5.17) 

Here only such q is meaningful that equals a q r in the i?-RSB ansatz, or, is a limit of a q r if R — * oo. However, this 
expression can be understood, at least formally, for all q-s in the interval [0, 1]. 



4- Replica correlations in terms of Green functions 

It is instructive to redisplay the formulas for Ca and C'AB(q) in terms of GF-s. Their natural generalization will 
yield the GF technique and the graphical representation for general correlation functions. 

The time evolu tion o f the $ field can be expressed by means of the GF. Based on the relation between P(q, y) and 
the GF given by ( 4.72j ) we can write 

C A = J dyg v (0,0;l,y)A(y). (5.18) 

Correlators can be conveniently represented by graphs. On the obvious case of Ca, see Fig. we can illustrate the 
graph rules. 
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q = 



9 = 1 



FIG. 2. Graphical representation of Eq. (5.18) for Ca- The line corresponds to the GF associated with the field (p. Its two 
g-coordinates are taken at the endpoints of the line and the two ^/-coordinates are integrated over. At q = 1 the function included 
i n the integrand is displayed. At q — the Dirac delta S(y), understood in the integrand and forcing the zero y-argument in 
(5.18), is not indicated, because it is present for all correlators. 



We symbolize the GF 0<p(qo,yo; qi,yi) by a line stretching between qo and q\. Over the y-s appropriate integrations 
will be understood. If go = the corresponding yo is set to zero, i. e., integration is done after multiplication by a 
Dirac delta. For this is always the case in our examples, we do not put any marks at q = 0. A weight function under 
the integral at q — 1, like A(y) in ( |5.18| ), should be marked at the right end of the line. In sum, Ca is a single line 
between q = and q = 1, labeled by A(y) at q = 1. 



As to the second order correlator (5.17), based on Eqs. (4.79,4.80) we can write i)a and i?g in terms of the GF and 
obtain 



GabW) = J dydyi dj/2 G v {0, 0; q, y) Q v {q, y; l,yi) A{y x ) Q v (q, y; l,y 2 ) B(y 2 ). 
Its graphic representation is given in Fig. |^, it consists of a single vertex. 



(5.19) 



A(y) 



B(y) 



FIG. 3. The correlation function Cab(<i)- 



The third order correlator Cabc(cl, b, c), see ( |5.12 ) for notation, can be analogously calculated. We can assume 



without restricting generality that r(a,b) > r(a, c) — r(b,c), and us e th e notation qi = q r f ajC ) < q 2 = q 



r(a,b) • 



The 



qi-s, i = 1,2, used here should not be confounded with the q r -s of ( |4.6| ) from the i?-RSB scheme. In this case the 
two iterations (5.4a) with re spect ive initial conditions A(y) a nd B (y) merge at r(a, b). Switchi ng to parametrization 
by q means that the PDE ( 4.49 ) rather than the iteration ( 5.4a ) is to be considered. Thus ( 4.49 ) should now be 
used in two copies, one with initial condition $^(1,2/) = A(y) and the other with i9s(l,y) = B(y). They merge at 
q 2 . Tha t means, the "incoming" fields multiply to yield a new initial condi tion ^ab (Q2 , y) = $a(<Z2, 2/)$b(92, y), like 
in ( 5.15| ), and hence for q\ < q < q 2 the field i?ab(9,2/) obeys the PDE ( 4.49j ). In qi another merger takes place 
wit h the incoming field $c(q,y)- This started from the initial condition i?c(l,J/) = C(y) and has evolved according 
to ( 4.49j ) until q = q\. Here the product of the two incomin g field s -dABciqi^y) — $Ab(Qi, y)$c(<li, y) becomes the 
initial condition at q = q\ for the fina l stre tch of evolution by ( 4.49| ) down to q = 0. T he re sulting correlator is easy to 
formulate in terms of GF-s. Indeed, ( 4.80 ) with h = gives the solution of the PDE ( |4.49| ) starting from an arbitrary 
initial condition, specified at an arbitrary time. Hence 

CW(gi,? 2 ) = ((A(y a )B(y b )C(y c ))) 

= J dyi dy 2 dy 3 dy 4 dy 5 P(q 1 ,y 1 )g v (q 1 ,y 1 ;l,y 3 )C(y 3 ) 

xG lp (qi,yi;q2,y2)G v (q2,y2;l,y4)A(y 4 )g (p (q 2 ,y 2 ;l,y 5 )B(y 5 ). (5.20) 
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The corresponding graph is on Fig. [|, it has two vertices. The special case r(a, b) — r(a, c) = r(b, c) correspo nds to 
<Zi = 1i- Then we wind up with a single vertex of altogether four legs, and accordingly, the G ip (qi, yt; (72)2/2) hi ( 5.2C| ) 
should be replaced by S(yi — y%). 




FIG. 4. The correlation function Cabc(<1i,<12)- 



A general correlator of local fields y can be graphically represented starting out of the full ultrametric tree [ pT| . 
This can be visualized as a tree with R + 1 generations of branchings and at the r-th generation having uniformly the 
connectivity m r /m r+ i. The i?+l-th generation has n branches, to the end of each a "leaf" can be pinned. The leaves 
are labeled by the replica index a = 1, . . . , n. Between r = and r = 1 is the "trunk" . For a - possibly large - integer 
number of replicas n this is a well defined graph. If n — ► then the m r -s cannot be held integers and possibly the q r -s 
densely fill an interval, thus the full tree looses graphical meaning. On the other hand, the graphs representing replica 
correlators can be understood as subtrees of the full tree for integer n, and remarkably, they remain meaningful even 
after continuation. 

On Figs. [| ||, |J we illustrated the first three simplest local field correlations by graphs. There a branch connecting 
vertices of time coordinates say q\ and q2 > q\ was associated with Qu>(qi,yi; q%, 2/2)1 with implied integrations over 
the local field coordinates. This feature holds also for higher order correlations. Similarly to the case explained in 
Section VA2, then again the iteration (5.4), or, equivalently, the PDE ( 4.49] ) emerges. Given an interval (ffi, ff g) th e 
initial condition for a field •§ is set at the upper border q2, then $ undergoes evolution by the linearized PPDE (4.49), 
and the result is the solution at q\. Since G v (qi, yi] 92, 2/2) is the GF that produces the solution of ( 4.49| ) out of a 
given initial condition, it is natural to associate the GF with the branch of a graph linking q\ with qi- Since the GF 
is in fact an integral kernel, integration is to be performed over variables y\ and yi at the endpoints of the branch. 
This automatically yields the merging of incoming fields •& at a vertex to fo rm a new initial condition, as exemplified 
(before continuation) for the second order local field correlator in Eq. ( 5.15 ). Indeed, the local field y associated with 
a vertex at q of altogether three legs is the fore y argument of two incoming GF-s and the hind y argument of one 
outgoing GF, so the latter evolves the product of the incoming $ fields towards decrea sing t imes starting from q. 

The graph rules for the general local field correlator CAB...z(a,b . . . z), defined by ( 5.12Q , can be summarized as 
follows. Draw continuous lines starting out from the leaves corresponding to the replica indices a, b ... z along branches 
until the trunk is reached. Lines will merge occasionally, and in the end all lines meet at the tr unk. The merging 
points are specified by the merger indices r(a, b) . . . , or equivalently, by the q r ( a ,b) ■ ■ ■ values from ( 5.14 ) for each pair 
of the replica indices we started with. Obviously, not all such q-s for different replica index pairs from the set a, b ... z 
need to be different, in the extreme case all such q-s may be equal. The graph thus obtained is, from the topological 
viewpoint, uniquely determined by the given set of replica indices of a correlator. Then the explicit dependence on 
the replica indices a, b, ... z is no longer kept, instead they appear through merger indices r(a, 6), . . . , or, equivalently, 
qr(a,b)i ■ ■ ■ ■ This allows us to take the n — ► limit. In th e end , the correlator becomes a function of all q r ( a .b)i ■ ■ - -s 
that can be formed from the replica indices a, 6, ... z of ( 5.1 2| ) . Now that each branch merging has a given time q 
value, it is useful to include the coordinate axis of q with a graph. 

The calculation of a correlator implies evolution by the PDE ( 4.49Q , first with different y variables along the 
respective branches, from the leaves towards the trunk. The functions A(y), B(y). . . . Z(y) are the initial conditions 
of this evolution until the first respective merging points. Whenever branches meet, say at a qi, the fields $(i)(</i, y), 
#(2)(<7i,y), etc., associated with the different incoming line s multiply, all having a common y local field. Thus is 
created a new initial condition for further evolution by (4.49), from onward to decreasing q-s. At the last juncture, 
say q\, the y-integral of the product of the incoming fields weighted with P(qi,y) yields the correl ator i n question. 
Obviously, the branches that connect merging points can be associated with the GF Q v of the PDE ( 4.4E ). It follows 
that at a merging point of two branches the y-integral gives the vertex function T wv of (4.8S). 
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It should be noted that the correlator Cab...z{o>, b . . . z) is now expressed as an integral expression, where the 
product A(y a ) B(yb) ■ ■ ■ Z(y z ) appears in the integrand. Thus an average of the more general form 



((A(y a ,y b , . . .,y z ))) 



(5.21) 



is obtained by our replacing A(y a ) B(yb) ■ ■ ■ Z(y z ) by A(y a , y^, ■ . . , y z ) in that expression. Then we loose the picture 
of d fields independently evolving from q = 1 by the PDE (4.49) and then merging for some smaller q-s, because the 
function A(y a ,yb, ...,y z ) couples the $ fields at the outset q = 1. In what follows we will not encounter averages 
( fn\ ) of non-factorizable functions. 



In summary, a given correlation function is represented by a tre e, tha t is a finite subtree of the full ultrametric tree. 
Leaves are associated with initial conditions of the evolution by ( 4.49 ). Branches directed from larger to decreasing 
q correspond to the GF Q v . Each vertex, including the leaves and the bottom of the trunk, has a q,y pair associated 
with it. At the leaves q = qB.+i = L an d there is integration over y-s in each vertex. At q — simply y = 



should be substituted into the final formula, so the GF of the trunk becomes just Sompolinsky's field P due to (4.72). 
The intermediate q-s will be the independent variables by those we characterize the correlation function. Thus a 
tree uniquely defines an integral expression, furthermore, topologically identical trees correspond to the same type 
of integral. Of course, two topologically identical trees can have different functions associated with their respective 
leaves, and then the two integrals will evaluate to different results. 

Elementary combinatorics gives the number N(K) of topologically different trees of K leaves in terms of a recursion. 
Denoting the integer part of z by [z] we have 



N(l) = l, 

m-i)/2] 

N(K)= N(k)N(K-k) 



fc=i 



K-l\ (K 



V ' 2 



(5.22a) 
(5.22b) 



The basis of this recursion is the fact that in a tree with K leaves two subtrees meet at the trunk, one having k 
and the other K — k number of leaves. The sum is interpreted as zero for K = 2. The second term on the r. h. s. 
contributes only for K even, it gives the number of trees that are composed out of two subtrees both having K/2 
leaves. Some terms generated by the above recursion are N(2) = 1, iV(3) = 1, N(4) — 2, N(5) = 3, N(6) = 6, 
N(7) = 11, N(8) = 23. For K = 1,2,3 we have N(K) = 1, in accordance with our previous finding that in each of 
those cases th ere is only one graph, see Figs. ||, ||, ||. 

In deriving ( 5.22 ) we assumed that vertices have altogether three legs. In that case the number of vertices is K — 1. 
If q-s coincide because branches shrink to a point then the number of vertices decreases and vertices with more than 
three legs arise. The corresponding integral expressions are consistent with the graph rules laid done before. Indeed, 
a branch of zero length is associated with the GF as in (4.65), i. e., gives rise to a Dirac delta equating the local fields 
at its two endpoints, wherefore each remaining branch still represents a GF and the vertex with more than three legs 
will still have a single y variable to be integrated over. 



5. Replica correlations of x-s 



Derivatives by q a b of the archetypical expression (4.1) play an important role in determining thermodynamical 
properties. Let us introduce the expectation values ( |5.1[ ) of products of 



C^ fc) (ai,...,a fe ) = (-i) k ((x ai x a2 ...x a 



(5.23) 



The (— i) k is factorized for later convenience. This is the correlation function of order k of the variables x aj . Correlators 
of even, 2fc, order are related to the derivatives of (4.1) by the matrix elements q a b as 



C( 2fc )( ai ,...,a 2fe ) 



Qk e n<p[0(y),Q] 
Hffl2 • ■ ■ ^1a 2k -io 



(5.24) 



Second order correlators enter the stationarity conditions (3.21, 3.24, 3.25| ), and fourth order ones appear in studies 
of thermodynamical stabi lity, as we shall see it later. 

By partial integration (5.23) can be brought to the form of the average of products of various derivatives of <P(i/a) 

as 



cl fe) ( 



■,a,k) 



I w ^ v ~ hQx dy - dy - • ■ ■ dy - exp ^ 



(5.25) 
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where coinciding replica indices give rise to higher derivatives. In the special case when all aj indices are different we 
have 



C^{a u . . . ,<) = ((<P'(y ai )<P'(y a2 ) . . . -P'(y ak ))) 



(5.26) 



Note that in the case of a discontinuous <&(y) we may not use the chain rule of differen tiation, therefore in (5.25) the 
derivatives sh ould act directly on the exponential. Then, in the spirit of Section IV B i , we can conclude that 
as defined in (4.113b) should be used in lieu of <P'(y), so the field /i(q,y) defined in (4.111) evolves from q = 1 down 
until the first merging point in its way (the first vertex to be met when coming from a leaf at q — 1). In the following 
general treatment we assume a smooth <P(y), with the note that the adaptation of the results to discontinuous ones 
is straightforward. 



Expression (5.26) is of the form (5.12), so 

C^ fe) (ai, . . . ,afc) = C#'...<p/(ai, 
We review some low order correlators below. 



(5.27) 



6. One- and two-replica correlators of x-s 



The simplest case of replica correlation function of x-s is the average of a single x. Eq. (5.27) for k = 1 becomes 
independent of the single replica index and gives a formula of the type (|5.10|) as 



dyP(l,y)<P'(y). 



(5.28) 



Comparison of (4.46) and (4.49) shows that with the present initial condition d{q,y) = fi(q,y). Thus, recalling that 
P(l,y) — £7^,(0,0; 1,3/), we get alternatively 



c« =m(o,o). 

This is shown on Fig. || graphically, it is a special case of Fig. 



(5.29) 



q=0 q=l 
FIG. 5. The graph of is a single line. 



Let us now turn to the correlator of two defined in (|5^23| ). If the replica indices are different then ( |5.26| ) 

applies; that should be complemented to allow for coinciding indices as 



Ci 2) (a, b) = CW (a, b) + S ab 



(5.30) 



This function depends on the replica indices through the overlap q at the merger q = q r ( a ,b)- The first term on the r. 
h. s. is a special case of the corre lation function Ca_ b (<z) g iven in Eq. ( |5.19 ) with A(y) = B{y) = <P'(y). Note, however, 
that the [i field satisfying (4.46) is in fact the "d of ( |4.4S| ) star ting from the initial condition <£'(y). Therefore the two 
instances of convolution of the GF with <l>'(y) give n(q,y) in ( 5.19| ) and we get 



d?/ P{q r (a,b) , V) M?r(a,6) , vf 



(5.31) 



for the first term on the r. h. s. of Eq . ( 5.30| ). The second term there is of the type studied in Section V A 2| . Note 
that the initial condition is by ( 4.48b ) just n(l,y). Furthermore, r(a,a) = R + 1 and q aa = q r ( a ,a) = 1- I 11 summary, 
for the g-dependent two-replica correlation function we obtain 



C?\q) 



_ ( J dyP(q,y) /j,(q,y) 2 



if q< 1 



JdyP(l,y) [Khy) 2 + *(!,!/)] if 9 = 1, 



(5.32) 
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having omitted the subscript r{a. b) from q. Note that the second term on the r. h. s. of ( |5.30 ) contributes at q = 1. 
The above formula can be abbreviated as 

(q) = J dy P(q, y) [»(q, yf + 6(q - l" ) k(1, y)] , (5.33) 

where the second term is nonzero only if q — 1 . We will use the shorter notation with the Heaviside function in similar 
cases hereafter. Fig. ^ summarizes the result graphically. 



-6{q-V 



&>{y) 



FIG. 6. The correlation function Ci (q) 



As it was emphasized earlier, the correlator is meaningful for q arguments at the stationary q a b-s, or at their limits 
for n — > 0. For q-s where x(q) = the extension of the correlators is not uniqu e. For instance, we can write any 
9(1) < 9 < 1 (for finite i?-RSB, qnj = qn, and for continuation see Section |IVB l| ) in lieu of 1~° in the argument of 
the Heaviside function in (5.33). Note t hat the tw o-replica correlation function, like the fields obeying the PPD E and 



the PDE-s described in Sections IVB2 and IVB2, does not have a plateau in (qn), 1). In summary, expression (5.32) 



is the two-replica correlation function for both the finite i?-RSB and R — > oo, at arguments q where x(q) ^ 0. 



7. Four-replica correlators 

The native form of the four-replica average is by ( |5.25 ) 

Ci 4) {a,b,c,d) = C$ i$i$i$i(a, b, c, d) + [S a b C&"&'$>(a, c, d) + 5 comb's] 

+ [S a bS c dC<pii<pii (a, c) + 2 comb's] + [S a bcC$>'>$> (a, d) + 3 comb's] 

+5abcdC$[4] . 



(5.34) 



Here "comb's" stands for combinations, then we used the short hand notation that a <5 a fc...c = 1 only if all a, b. . .c 
indices are equal, else S a b...c = 0, furthermore, the abbreviation (4.105) is understood. 

In order to simplify notation, we switch to using q j fo r the parametrization of expectation values. The qi-s should 
not be confounded with the q r values introduced in (16) for the i?-RSB scheme. 

There are only two essentially different correlat ion f unctions, because two topologically different trees with four 
leaves can be drawn. Indeed, N(4) = 2, c.f. Eq. ( 5. 22 ). The graphs are shown on Fig. |7[ They correspond to the 
first term on the r. h. s. of Eq. ( 5.34 ) and thus represent the case when all replica indices are different. Taking into 
account coinciding indices is somewhat involved both analytically and graphically, we give below only the formulas. 

The graph in Fig. 0a corresponds to 

C^ 1) (quq 2 ,q 3 ) = J dyP( gi , y) E (q u y; q 2 ) E y; q 3 ) + 9( gi - l" ) J dj/P(l, y) (5.35) 



where 



H {<lx,yi\ 92) = J dy 2 G<p(qi,yy, qi,y-i) \\i{qi,yif + 9{q 2 - 1 )$"{y2)] 



(5.36) 



Note that 5 (qi, y\ \ q 2 ) can be considered as a generalized two- replica correlation with extra q%, y\ dependence, because 
5(0,0; g) = -C^\q). The inequalities 



qi < q-i < 1, ?i < 93 < 1 



(5.37) 
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are understood, so the last term on the r. h. s. of ( 5.35| ) is nonzero only, if = 1, i = 1,2,3. 
The topologically asymmetric tree of Fig. 0b is associated with 



cl 4,2) (91,92,93) = / dyi dy 2 p (91, vi)n {qi,yi)Q<p (91,2/1; 92, 2/2)^(92, 2/2) 2(92,2/2; 93) 

+0(g 2 -l~°) / d yi dya •?(?!, (q 1 ,y 1 ;l,y 2 ) $'"(y 2 ), 



where we assume 



9i < 92 < 93 < 1 



(5.38) 



(5.39) 



but also require q\ < 1, because the case 91 = 1 has been settled by Eq. (5.35) 



(a) 




(b) 



9i 92 












1 



FIG. 7. The correlation functions (a) Cx (qi, 52, 93), (b) Cl 4,2 ' (51, 92, 93), when all qt < 1 and are different from each other. 
The #'(y) functions at the tip of the branches at q = 1 are understood but not marked. 



In conclusion, given the GF for the linear PDE ( 4.49 ), correlation functions can be calculated in principle. In- 
terestingly, the GF for a Fokker-Planck equation also assumes the role here as the traditional field theoretical GF. 
Note that this is an instance where a mean-field property transpires: the graphs to be calculated are all trees. It 
should be added that here the tree structure is the d irect consequence of ultrametricity |Q, and may carry over to 
non-mean-field-like systems with ultrametricity J225[ . That simple form of graphs is a priori far from obvious, since 
there are techniques for long range interaction systems where diagrams with loops are present ||S4) . In hindsight we 
can say that by using the GF of a Fokker-Planck equation with a nontrivial drift term, we implicitly performed a 
summation of infinitely many graphs of earlier approaches. 



B. Variations of the Parisi term 

The variation of the free energy term by the OPF x(q) is necessary in order to formulate later stationarity conditions, 
and second order variations yield the matrix of stability against fluctuation of the OPF. In this chapter only the 
mathematical properties are investigated, physical significance will be elucidated later. 



1. First variation 



The main result of Section |y| is that the ubiquitous term (4.1) boils down within the Parisi ansatz to ( 4.38| ), i. e., 

lim l Q] - <p[$(y),x(q)] = <p(0,0). (5.40) 

n— >0 

In order to determine the variation of <p(0, 0) in terms of x{q) we introduce small variations as x — * x + Sx and 
(/ ?—>(/ ? + Sip and require that the varied quantities also satisfy the PPDE (4.36a) with the same initial condition 



( 4.36b]) for <p + 5<p. Linearization of the PPDE in the variations gives 
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dqSlfi 



^/i 2 Sx, 



0, 



(5.41a) 
(5.41b) 



where n(q,y) — d y tp(q,y) satisfies the PDE ( 4.46| ). Eq. ( 5.41 ) is an inhomogeneous, line ar PD E for 8<p(q,y), given 
x(q ), 6x (q), and fi(q, y). Note that this is of the form of the linearized PPDE with source (4.79). Its solution is given 
in fl4.8C|), whence 



whence 



in 



dq 2 / dy 2 g v (qi,yi;q 2 ,y 2 ) v{q 2 ,y 2 ) 2 8x(q 2 



5x(q 2 ) 



= |%2 - qi) J dy 2 g v (q 1 ,y 1 ;q 2 ,y 2 ) fi(q 27 y 2 f 



Thus the variation of the term ( 5.4C ) is 

Mo,o) 



Sx(q) 



dyg v (0,0;q,y) /j,(q,y) 2 



dyP{q,y)n(q, yf 



(5.42) 



(5.43) 



(5.44) 



Here we used the identity (4.72) between the GF and the f ield P(q,y). It is interesting that the above formula is in 
fact proportional to the two-replica correlation of Eq. (5.33) 



5x(q) 2 x W 



(5.45) 



for q < 1. Since the correlation function can also be obtained by differentiation in terms of q a b, we have by Eqs. (4.1, 
5~32[ , for q < 1 



lim 

n—>0 



dny[${y),Q] 



'lab 



^(0,0) 
6x(q) 



(5.46) 



This relation tells us that if a free energy is the sum of terms (4.1) then the two stationarity conditions, one obtained 
by differentiation in terms of the matrix elements q a b = q and the other by variation in terms of x(q), are equivalent. 
Such is the SK model, the spherical neuron, and the neuron with arbitrary, independent synapses. 

In the case of a discrete i?-RSB scheme ( [1.5j ) variation by x(q) is made with the assumption of a plateau, i. e., x(q) = 
x, < x < 1, in an interval /. Then the role of the variation will be taken over by the derivative in terms of the 
plateau value x and of the endpoints q± and q 2 . It is straightforward to show that 



ay(Q,Q) 

dx 



Ci 2 \q)dq 



(5.47) 



results. Since the fields P and n are purely diffusive in J, the g-integral is Gaussian. On the other hand, the derivatives 
in terms of the endpoints are \C^p at the endpoints due to Eqs. (|5.46 , 5.45). If we work with an ansatz for the OPF 
that has b oth x(q) > and x(q) = x, < x < 1, segments, then ( [5.44 ) should be used in an interval where x(q) > 
and (5.47) along a plateau. If x(q) > at isol ated points, like in a finite i?-RSB scheme at jumps, differentiation in 
terms of the location of that points results in (5.44) at that points. 



2. Second variation 



The stability of a thermodynamic state against fluctuations in the space of the O PF x (q), the so called longitudinal 
fluctuations, can be studied through the second variation of the free energy term ( [5.40 ). We will present here briefly 
the way the longitudinal Hessian can be calculated. 



In order to determine the variation of the first derivative (5.44) we should vary the fields \i and P. For fi we obtain 
by definition 



fyf(gbj/i) 
Sx(q 2 ) 



9* 



5<p(qi,yi) 

5x(q 2 ) 



\6{q-2-qi) \ dy 2 d yi g v (qi,y 1 ;q 2 ,y 2 ) fi(q 2 ,y 2 ) 2 



(5.48) 
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In order to calculate the variation of the field P we need to vary the SPDE (4.53). This yields 



d q SP = \dl5P -xd y (n5P) -xd y (P5n) - Sx d y (fiP) , 
5P(0,y) = 0, 

This can be solved by using the fact that the GF for the SPDE is the reverse of Q v . Thus 

5P{qi,V\) = - J dq 2 J dy 2 Q v {qi, V2l 9i, Vi) 

x {x(q 2 )d V2 (P(q 2 ,y 2 )6fj,(q 2 ,y 2 )) + d y2 (P(q 2 ,y 2 ) fi{q 2 ,y 2 )) 6x(q 2 )} . 



(5.49a) 
(5.49b) 



(5.50) 



Hence the variation of P{q\,y\) by x(q 2 , y 2 ) is straightforward to obtain, where also Eq. ( |5.48 ) should be used . 
The above preliminaries allow us to express the second variation of the free energy functional. Varying (5.44) gives 



*y(o,o) 

5x(q 1 )6x(q 2 



dx(q 2 ) 



Sx(q 2 ) 



(5.51) 



Substitution of the variation of P(qi, yi) and of fi(qi, y\) yields after some manipulations 

X 1 T \ = 2 / dyidy 2 d yi g^{q mm ,yi]qm^,y2) P(qmin,yi) M(?min,yi) /^(?max,y2) 2 

dx(qi)dx{q 2 ) z J 

+ \ J dq 3 x(q 3 ) J dy 1 dy 2 dy i P(q 3 ,y i ) 

x 9 y3 ^(g3,y3;gi,2/i) d V3 g v (q 3 ,y 3 ;q 2 ,y 2 ) n(qi,yi) 2 ^{q 2 ,y 2 ) 2 , 



where 



<?min = mm(gi,g 2 ), 
9max = max((7i,g 2 )- 



(5.52) 



(5.53a) 
(5.53b) 



Note the symmetry of (5.52) w. r. t . the interchange of q\ and 172 ■ If we have the extremizing x{q) as well as the 
GF Qtp, the latter yielding by ( 4.81 ) the field /i, then Eq. ( 5.52 ) is an explicit expression for the second functional 
derivative. 



C. The Hessian matrix 



There are results in the literature on the algebraic properties of ultrametric matrices that can be straightforwardly 
applied to the present problem. As we shall see below, this amounts to finding, in the state described by a general 
OPF x(q), an explicit expression for the eigenvalues of the Hessian in the so called replicon sector, deemed to be 
"dangerous" from the viewpoint of thcrmodynamical stability. 



1. Ultrametric matrices 



The Hessian, or, stability matrix of the free energy term (4.1) is 

d 2 n V [<P(y),Q] 



A I, 



b.cd 



dq a b dq 



cd 



(5.54) 



If the replica correlations of a; a -s as in (5.24) are thought as moments then (5.54) is analogous to a cumulant, and can 
obviously be expressed as 



M abiCd = ((x a x b x c x d )) - ((x a x b )} ((x c x d )) = C^(a, 6, c, d) - C {2 \a, b) C {2 \ Cl d). 



(5.55) 



The transposition symmetry of the matrix Q was understo od i n the above definition. The Hessian ( 5.54 ) becomes 
a so called ultrametric matrix [111] once the -R-RSB form (4.4) for Q is substituted. Note that while constructing 
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the stability matrix we did not differentiate in terms of the indices x r . Indeed, one produces the Hessian before the 
hierarchical form for Q is substituted, and at that stage the parameters of the -R-RSB scheme do not appear. 

We can now comfortably apply the results of the elaborate study by Temesvari, De Dominicis, and Kondor [111] 
about ultrametric matrices. Such matrices have four replica indices and are in essence defi ned by the property that 
they exhibit the same symmetries w. r. t. the interchange of indices as t he H essian (5.54) with a Parisi Q matrix 
substituted in it. The theory was originally formulated for finite -R-RSB [ 111 | , but, as we shall see, continuation of 
the formulas comes naturally. Firstly we s hould clari fy nota tion. Let us remind the reader to the mer ger index r(a, b) 
defined in the -R-RSB ansatz by Eq. (5.14) in Section V A5. The r(a,b) was denoted by a Ob in Ref. [111]. According 



to the convention of 1 1 1 1 , the elements of the ultrametric matrix M can be characterized in a symmetric way by four 
merger indices, among them three independent. Redundancy is the price payed for a symmetric definition. The new 
indices are 



r = r(a,b), 
n = r(c,d), 

T2 = max[r(a, c), r(a, d)}, 
r 3 = max[r(6, c),r(b, d)}, 



whence 



Mr°£ = M a 



b.cd 



(5.56a) 
(5.56b) 
(5.56c) 
(5.56d) 

(5.57) 



is just a relabeli ng of the Hessian matrix elements. 

According to [111] one can distinguish among three main invariant subspaces — sectors — of the space of Q 
matrices. Here we give a loosely worded brief account of the decomposition, emphasizing also the physical picture 
that transpires from comparison with earlier results on the SK model. 

The longitudinal secto r is spanned by Parisi matrices with the same set of m r , or, e quivalcntly, x r (its relation to 
the m r is given by ( 4.12| )), indices as the matrix Q had that was substituted into (5.54). In the general case (without 
restrictions like the fixing of the diagonal elements) this space has R + 1 dimensions. The projection of the Hessian 
onto the longitudinal sector isaR+lxi?+l matrix, whose diagonalization cannot be performed based solely on 
its utrametric symmetry, but should be done differently for different free energy terms <p ]^(y), Q]. The lon gitudinal 
Hessian in the R — > oo limit is related to the Hessian of the functional (p[<l>(y),x(q)] (see Section VB2). This is 
demonstrated by the variational stability analysis of the SK model, within the continuous RSB scheme, near the spin 
glass transition, as performed in Ref. p7| . The eigenvalue equation obtained by variation was r ecovered by taking 
the R — > oo limit of the eigenvalue problem within the longitudinal sector of the Hessian ( 5.54 ). The longitudinal 
subspace can be considered as the generalization of a deviation from the RS solution that equally has RS structure, 
i. e., the longitudinal eigenvector of de Almeida and Th oules s (AT) [|96|. 

The second sector has been called anomalous in Ref. 



111]. It may be viewed as the generalization of the second 



family of AT eigenvectors. The ultrametric symmetry allowed the transformation of the Hessian restricted to this 
invariant subspace into a quasi-diagonal form of n — 1 pieces of-R+lX-R+1 matrices |111[ ] . Some of these submatrices 
are identical, there are only R different of them in the generic case. Again, the diagonalization of these submatrices 
is a task to be performed on a case- by-case basis. To our knowledge no such study has been performed for R > 1. 

The third is the so called replicon sector. Here the ultrametric symmetry made it possible to fully diag onali ze the 
Hessian, resulting in an explicit expression for the replicon eigenvalues in terms of Hessian matrix elements [111]. The 



replicon modes, the elements of this subspace, are the generalization of the eigenvectors of de Almeida and Thouless 
that destabilized the RS solution of the SK model. In other words, these can be thought as responsible for replica 
symmetry breaking. In the stability analysis by Whyte and Sherrington ][llj on the 1-RSB solution of the storage 
problem of the spherical neuron (by Ref. ]Q) it was equally the replicon eigenvalue that caused thermodynamical 
instability. Note that the replicon modes were also termed as ergodons by Nieuwenhuizen J68| , |69[ , due to their role in 
the breakdown of ergodicity in an RSB phase. 



2. Replicons 



The replicon sector has special physical significance, since instability there in known cases signaled the need for 
higher order -R-RSB. The replicon eigenvalues of an ultrametric matrix can be written as [ 1 1 1 1 



r 2 ,r 3 



R R 

EE 

s=r 2 t=r 3 



m s+ im t+ i 



+l,t+l 



M Tl ' ri 

1V1 S + l,t 



1V1 s,t+l 



(5.58) 



51 



where < n < R and n < r2i^3 < The fj-s are no longer attached to replica labels as they had been in Eqs. 
( 5.56 ). This discrete expression lends itself to continuation, when one uses parametrization by q Ti to relabel as 



M(q ri ,q r2 ,q r3 ) = M^^ 



(5.59) 



Here the inequalities (5.37) are implied. Using the simpler notation of qi-s for parameterization we get for the replicon 
eigenvalues 



A (9i,92, 93) 



4° 



dq 2 



+0 

<?3 



dg 3 x{q 2 ) x(q 3 ) dg 2 % M (q 1 , q 2 , q 3 



(5.60) 



Comparison with the sum above shows that the inequalities 91 < 92,93 < 9 m (= 9_r) need to hold, and, of course, 
the eigenvalue is defined only in those g,-s where i(qi) ^ 0. Expression ( 5.60| ) is unambiguous even though the 
correlation functions and so the integrand are ill defined over intervals where x{q~i) has a plateau. In such an interval 
the integrand bec omes a derivative and we define the quadrature as the difference between values at the en dpoin ts of 
the interval. Eq. (5.60) is equivalent to a formula expressed in terms of the variab le x that was quoted in [ 221 1 . We 
call the reader's attention also to the fact that the continuation of the sum ( 5.5§| ) implies that in case of ambiguity 
the right-hand-side limit in q of the partial derivatives are to be used. This distinction is generically of no import in 
regions where 00 > x{q) > 0, but is necessary to be made at steps, where the left and right limits arc different. The 
lower integration limits in (5.60) carry the superscript +0 for this reason. In order to simplify notation, hereafter we 
often omit the mark +0 but understand it tacitly wherever necessary. 



Next we use the expression of the Hessian through correlators as given by ( 5.55 ). After inspection of how the 
discrete labeling was converted to continuous parametrization we get 



M (ft, 92, 93) = ( gi ,q 2 , q 3 ) - ( qi f , 

where the fourth order correlator defined in ( |5.35| ) appears. Hence the replicon spectrum is 

A (91, 92, 93) = / dq 2 dq 3 x(q 2 ) x(q 3 ) d q2 dg 3 C ( x 4A) (<ji, 52,93) • 

J H2 J 13 



From expression (5.35) for the correlator we obtain 

A (91, 92, 93) = J dyP{qi,y) A( qi ,y;q 2 ) A(qi,y;q 3 ), 

where by definition 



A(9i,yi!92) 



/ dg 2 a;(g 2 )%"(9i,2/i;92) 



(5.61) 



(5.62) 



(5.63) 



(5.64) 



Using Eq. ( [5. 36 ) for 5 and the identity ( 4.88 ) then substituting for the product x{q) n(q,y) 2 the other ter ms in Eq. 
(4.48a), next performing partial integration and noting that Q v satisfies in its hind variables the SPDE (4.53), we 
obtain 

(5.65) 



(5.66) 



A (gi, 2/1592) = J dy 2 g v (q 1 ,y 1 ;q 2 ,y 2 ) n(q 2 ,y 2 ). 
The replicon spectrum can be expressed equivalently by the vertex function ( 4.85| ) as 

A (91, 92, 93) = / dy 2 dy 3 T vvv (qx; 0, 0; q 2 , y 2 ; 93, J/3) « (92, 2/2) K (93, 2/3) 



This fo rmu la ca n be graphically represented, if we recall that the field k is produced by the GF for the PDE 
(4.46a) by ( 4.82) ) . Let us mark with a dashed line, then we have the graph on Fig. |8|. 
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9i 92 93 

FIG , 8. T he replicon eigenvalue in terms of GF-s. The full line is Q v as before, the dashed line represents Q^, the GF for the 



PDE (4.46a) 



Here we reemphasize that the solution of the relevant PDE-s, in particular, the field tp(q, y) with its derivatives and 
the GF-s are assumed to be known, so the correlation functions and the replicon spectrum are considered as resolved 
if they are expressed in terms of the above fields and GF-s. 



3. A Ward-Takahashi identity 



Recent results indicate the existence of an infinite series of identities among derivatives of a function of Q, such 
as a free energy term, provided this term exhibits permu tation sy mmetry in replica indices and the derivatives are 
considered with a Parisi matrix substituted as argument 221 . 226 1. An equivalent source of the same identities is a 
"gauge" invariance, namely, the property that the free energy t erm looses its dependence on the specific m r and q r 
values and winds up depending only on x(q) in the n — -» limit [ 226 1 . These relations can be considered as analogous 
to the Ward-Takahashi identities ( WTI-s ), arising in field theory for a thermodynamical phase wherein a continuous 
symmetry is spontaneously broken 227 1. The continuous symmetry that is held responsible for the WTI-s is the 
replica permutation symmetry in the n — > limit, together with the appear ance of an interval in q where x(q) is 
continuous and strictly increasing 

mum- 

In our case, the free energy term (fOl) is of the aforementioned type, so 

we expect the WTI-s to hold. 

Interestingly, the lowest order nontrivial WTI c an be easily obtained based on the results expounded in the previous 
section. Let us consider the replicon eigenvalues (5.66) in the case of coinciding arguments 



= 9i = 92 = 93 < 9(i)- 



(5.67) 



The behavior of the vertex function T vvv for coinciding g-arguments can be easily deduced from the requirement that 
the GF-s become Dirac-deltas for coinciding times. Then the replicon eigenvalue assumes the form 



X(q,q,q) = / dy P(q,y) n(q,y) 2 = X(q). 



(5.68) 



This is precisely the r. h. s. of the identity (4.8S 
order correlator ( 5.33j ) for q < 1, therefore 



at q\ = 0, yi — 0, while on the 1. h. s. of same we discover the 2nd 



X(q) = CP(q). 



(5.69) 



Strictly, this formula should be take n only at q-s where the correlation function is defined, i. e., q-s that ar e lim its of 
some q r -s in the Parisi scheme ( |4.6| ). Nevertheless, we find that it holds with the smooth continuation of ( 5.33 ) and 



(5.68) for any < q < 1, the more so remarkable because the replicon eigenvalues were not defined for arguments 
larger than q^y 

Our present derivation yields just one identity out of a set of infinitely many, but its advantage is that it uses 
analyt ic for ms, and it is brief due to our prior knowledge about the properties of the relevant PDE-s. Note that the 
WTI (5.6G) was obtained for a mathematical abstraction, the formula (|4.1|), but will gain physical significance once 
we return to thermodynamics in Sections VII and VIII. 
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VI. INTERPRETATION AND SPECIAL PROPERTIES 



A. Physical meaning of x(q) 



In relation to spin glasses it has been shown that the OPF x(q) is the aver age p robability that the overlap of two 



spin configurations from two different pure (macro)states is smaller than q [ 1 1C ] . Furthermore, this property was 
found to naturally hold for combinatorial optimization problems that can be mapped to various spin glass models 
f[4j| . Similar feature follows from Parisi's ansatz for Q in the present neuron model evidently, but because of its 
significance we briefly give the derivation. Several further consequences of the hierarchical form of Q, as discussed in 
jbjj, also carry over to the neuron in the case of RSB. 

Firstly let us consider the expression (|4.§| ), where we replace Xi by 1 and q a b by some function of the elements 
F(q a b)- We obtain, using mo = n — > 0, 



R+l 



- F{q ab ) = -F(l) + l F (<lr) - F(g,-1 



whence, by continuation in the sense of Section IVB1 



71 < J 



= -F(q {1} 



r=0 



9<1) dq F(q) x(q) = - [ dq x(q) F(q). 
o Jo 



(6.1) 



(6.2) 



Here the assumption that only nonnegative q-s are relevant and fe+i = #d = 1 was used. 

A density for the off-diagonal matrix elements of Q can be obtained by substituting the Dirac delta for F(q) as 



n(n - 



' a<b „ =n Jo 



(6.3) 



Finally, using the notation (. . . )/ n ) f° r thermal average with n replicated partition functions, also averaged over the 
patterns, the mean probability density of overlaps P(q) is, by the definition of q a b, 



n(n - 1) ^ 

y ' a<b 



(S(q-N- x J a -J b )) 



(«) 



n=0 



n(n — 1) ' 

V ; a<b 



(%- 9ab))( n ) 



(6.4) 



n=0 



Since the quantity to be averaged on the r. h. s. does not depend exponentially on N, the saddle point known from 
the free energy calculation does not move. The average (. ..)(„) can be thus obtained by simp le su bstitution of the 
saddle point value in the Dirac deltas, that is, the (. . . )(„) sign can be removed and we obtain (6.3), that is, 



P(q)=x(q). 



(6.5) 



The P(q) considered here is not to be confounded with the probability field P(q, y) of Section IV B 3. This interpreta- 
tion of x(q) indeed restricts the physically relevant space to monotonous functions. Further consequence that should 
be born in mind is that q-s where P{q) = have vanishing relative weight in the thermodynamical limit. So any 
quantity depending on q carries direct physical meaning only for q-s where x(q) > 0. This reservation will hereafter 
be understood. The significance of the x(q) (or q(x)) order parameter in long range interaction systems extend to the 
finite ra nge problems. Indeed, the "mean field" q(x) plays a role also in the field theory of spin glasses as discussed 
in Ref. f22|. 



It should be emphasized that the distribution Ps{q) of overlaps for a given instance of the patterns S£, is not self- 



averaging. So the quenched average included in (. 
about the distribution of the random variable q. 



)(n) 



and hence in the definition of P(q) leads to loss of information 



B. Diagonalization of a Parisi matrix 



Since spectral properties of Parisi matrices (4.4) play an essential role in our framework, here we briefly review 
known results about them (see, e. q.. Refs. @f7§). Only the case q aa = 3d = 1 will be considered here, extension to 
any diagonals is straightforward. The eigenvalue problem is 

Qv (r) = D (r K (r \ (6.6) 
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where r labels the eigenvalues and -vectors. The simplest eigenvector belongs to r = and has uniform elements, 
say v^ ) = (1,1,.... 1). The r = 1 subspace is spanned by vectors, orthogonal to that are uniform over boxes 
of the first generation, each having mi number of elements. An example is ui = 1 if a = l\m\ + 1, . . . , £x(mi + 1), 
Ua 1 ' = —1 if a = £2^1 + 1, ... + 1), with £1,^2 < n/mx, integers, and ui = for other as. For a general r, 

the eigenvectors are uniform over boxes of size m r and orthogonal to all eigenvectors of lower indices, yielding the 
eigenvalues 



£f(r) = ^ m v (q p - q p -x) 



(6.7) 



The dimension of the space of vectors uniform in boxes of size m r is n/m r , this space is spanned by all eigenvectors 
of index not larger than r. Given the fact that the r — eigenvalue is non-degenerate, it follows that the degeneracy 
of the r-th, r > 0, eigenvalue is 



n(m r 1 — m r _ 1 ). 



Continuation of fl6.7| ) in the sense of Section [IV B 1| results in eigenvalues indexed by q as 

rl 



D(q) 



dq x(q). 



In the case of finite i?-RSB, comparison with ( pT^ ) gives 

D(q r ) = D^\ 



(6.8) 



(6.9) 



(6.10) 



thus formula (f3j|) incorporates both the -R-RSB c ase a nd the one when x(q) is made up of plateaus and curved 
segments. According to the conclusions of Section VIA, whereas the function D(q) is defined for all < q < 1, it 
gives eigenvalues only for q-s where x{q) > 0. In particular, after continuation and with the notation of Section IV B 1 , 
x(q) = 1 in the interval [<7(i), 1], so we have there from ( |6.9| ) 



D(q) = l-q. 



(6.11) 



While D(q^) is an eigenvalue, D(q^) = 1 — = D R+1 , the D(q) from Eq. ( |5.11 ) has not the meaning of eigenvalue 
for q > q (1) . 

The above results allow us to calculate the trace of a matrix function F(Q) 



R+l 



TrF(Q) = V n r F(D {r) ) = nV — F{D {r) ) - F(D {r+1) ) +nF(D (R+1) ) 



r=0 



r=0 



(6.12) 



In the continuation process we obtain 



1 



lim -TrF(Q) 

n^O n 



9(1) 



dqF'(D(q))+F(D( qi x))) 



= / dq [F'(D(q))-F'(l- q )}+F(l) 



dqF'(D(q))+F(0). 



(6.13) 



Note that depending on F(q) not all alternative forms may be meaningful, e. g., if F{x) = ln(l — x) or F(x) = In a; 
then the second or the third expression is ill defined, respectively. The explicit dependence on q^ was eliminated from 
the second and third formu las. These expressions stay valid also for finite -R-RSB. A special case is the calculation of 
the determinant for (3.17c) 



1 If 1 
lim -IndetQ = lim -Trln(Q) = / dq 

n~*o n n—>0 n J 



D(q) l-q\ 



(6.14) 



where the second formula from (6.13) was used. 
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Since in the stationarity relation (3.21) the inverse of a Parisi matrix appears, we will calculate that herewith. 
Because of the fact that that the diagonalizing transformation depends only on the m r -s, but not on the q r s, the 
inverse of a Parisi matrix is a Parisi matrix with the s ame \m r \ set. Thus also the elements of the inverse matrix 
depend only on the merger index r(a, b) introduced in ( |5.14| ). It is convenient to parametrize them also by q as 

[Q- 1 ]^^- 1 )^^)). (6.15) 

This defines a function q*- _1 ^(g) by continuation, that has plateaus within (q r -i,q r ) in the i?-RSB scheme. Equiva- 
lently, the inverse matrix can be represented by the inverse of q(~ l '(q), the function x^~ ^(q) (not to be confounded 
with the inverse of q{x) that is x(q)). The two characteristics are related through 

= x{q). (6.16) 

This expresses the fact that in a finite i?-RSB the set of x r indices is the same for Q and Q _1 . The spectra are in 
reciprocal relation, for q < qn\ 

£>( " 1) (9 ( - 1) (9)) = ^y, (6-17) 
whence by differentiation, using ([}!]) on each side, and requiring q^~^(0) = 0, we arrive at 

1 ^ (618 > 

This leaves the diagonal elements (9 _1 )_r+i = ^(l) of Q _1 undetermined, that is obtained from the reciprocal 
relation of the respective eigenvalues of index R+ 1, yielding 

? <-.» (1)= _l_ -r«. (6.19, 

1 - 9(1) Jo D{qY 

An attemp t to continuation of q^ 1 \q) between q^ and 1 shows that <T 1 ^(<7) is non-monotonic. Again, relations 
( 6.18 6. IS ) equally hold for the discr ete i?-RSB case, as well as when x{q) has both plateaus and curved segments, 
with the usual reservation that (6.18) relates matrix elements only when x(q) > 0. 



C. Symmetries of Parisi's PDE 



A systematic procedure of identifying all continuous symmetries of a PDE is the so called prolongation method 
J228| ] . The knowledge of a continuous symmetry group allows one to generate out of a given solution a family of other 
solutions. 

Via the prolongat ion m ethod we find by construction that there are altogether three one-parameter transformations 
leaving the PPDE (4.36) invariant. The action of these symmetries on a solution (p(q,y) can be given as a one- 
parameter family ip(s,q,y), with ip(0,q,y) = ip(q,y). These one-parameter families are 



<P2(s,g,y) = (f(q,y) + s, 

wis, Q>y) = t P (q, y - D (q) s ) - ys + \D{q)t 



(6.20a) 
(6.20b) 
(6.20c) 



where D(q) is defined by (6.9). The fact that the above families are solutions of the PPDE (4.36), provided (p{q,y) is 
also a solution, can also be shown by substitution. The additional statement, namely, that there are no more continuous 
symme tries, follows from the construction of the prolon gation method that we cannot undertake to describe here. 

Eq. ( 6.20a ) represents trans lation in y, while ( 6.20b| ) is a shift of the field tp by a constant, these symmetries are 
obvious. The third one, ( |6.20c ), is less so, it is a shift of the origin in y and of the field tp and a 'tilting' of the field ip 
in y. 

The symmetry transformation equally changes the initial condition. As a fo rwa rd reference we note that, in the 
case of the energy term for the storage problem of a single neuron, the PPDE @ has the error measure potential 
V(y) as initial condition. The const ant sh ift and the 'tilting' in y changes V(y) such that it no longer satisfies the 
properties of V(y) outlined in Section III A. Thus uncovering the above symmetries is of little help in finding solutions 
to the PPDE in the neuron problem at hand. However, given the relevance of the Parisi solution to a vast class of 
disordered systems, we considered the symmetries worth presenting. 
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D. Spherical entropic term: a solvable case of Parisi's PDE 



While most of the relevant quantities related to the spherical entropic term are straightforward to calculate, from 
the technical viewpoint they represent a s olvable example of Parisi's framework, suitable for an e xerc ise. 

Note that the general distribution ( |3.6| ) does not include the overall spheri cal normalization so t he results 

on independent synapses do not carry over. Nevertheless, we can cast (3.17c) into the general form (4.1) with the 
association 



2tt%). 



(6.21) 



The subscript s signals that we are dealing with the entropic term of the free energy. For we need to regularize the 
Dirac-delta, we use a Gaussian with small variance a. With the notation ( 4.92 ) we have 



$ Si(T (y) =hx(^G{y,cj)) 



)-\ 


~y±. 


f lner 




a 





Thus 



/ 8 (Q) - -(2/3)- 1 lndetQ = lim pr^npfaAv), Q] 

er— »0 



(6.22) 



(6.23) 



We keep a finite while performing continuation, i. e., the limits n — > and a — > will be interchanged. Then we need 
to solve the PDE (4.36a) with initial condition 



■•Ay)- 



This can be done by our assuming that tp s ^(q,y) is a quadratic polynomial in y. With the notation of 
Da(q) = <t + D(q), the solution is 



<p s ,*(q,y) = -~ 



Da(q) 



kit 



dq 



D a {q) 



Hence we obtain 



lim -/ S (Q) = f„[x(q)] = lim /TVs, CT (0,0) = l im _( 2 /J)- 

n-+0 n <t— >0 er^O 



dq 



D a {q) 



In a 



(6.24) 
|) and 

(6.25) 
(6.26) 



The rightmost expression is, apart from a prefactor, equivalent to (6.14). Either of them thus gives Eq. (3.17c). 

As it has been described in Section |y|, expectation values are calculated by using Q v . That is by Eq. ( 4.70| ) the GF 
of Eq. 04.490 , the linear PDE for the field Given the field 



Ms,<7(?) v) = 



DM)' 



from the definition ( 4.45 ) and from ( |6.25| ), the GF is found to be Gaussian 

G s ,<p(q2,y2;qi,yi) = G(A,B), 

A = y 2 , z -yi, 

Da{q2) 

B = D a (qi) 2 [E a (qi) ~ K(Q2) 
q dq 



Ea(q) 



o D a {qf 



Note that we omitted the subscript a from the GF. Sompolinsky's time-dependent density is by (4.72) 

P s (q,y) = G(y,D„(q) 2 Ea(q)). 
The generalized two- replica correlator ( 5.36| ) is thus 

Vl a-'9{q 2 -l-% 



3 s (gi,yi;g 2 ) = E a (q 2 ) - E <7 {q 1 ) 



DMif 



(6.27) 

(6.28a) 
(6.28b) 
(6.28c) 
(6.28d) 

(6.29) 

(6.30) 
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whence the correlation function (5.33) is 



C$(q) = S 8 (0, 0; g) = E a (q) - a~H{q - l" ). 



(6.31) 



The regularizing parameter a can be taken zero at many a place in the above formulae, an exception being the 

(2) 

correlator at q = 1, where the integration should be performed first in (qn\,l) to get the finite result CsJ(l) = 
E{Q(i)) ~ (1 — We evaluate the first of the two 4-replica correlators from ( 5.35 ) as 



CjfeV (qu 92, q 3 ) = 2 E a { qi f + [E a (©) - a 



92 



1-°)] [E a (q s )-a-H(q 2 -l-°)] 



The replicon eigenvalue from (5.62) is as 



A s (qi, q 2 , 93) = [D„ {q 2 ) D a (q 3 )]~ 



(6.32) 



(6.33) 



independent of q\. Note that the maximal argument allowed in t he ei genvalue is gm, if this is smaller than 1 then 
the regularization parameter a can be omitted. The one WTI (5.69) can be checked directly by comparing Eqs. 
G6.31U6.33Q . 

In this example the PPDE could be solved in closed form. The question obviously arises, under what conditions can 
the solution be obtained analytically. It is easy to see that if the initial condition at q — 1 is quadratic in y then, for 
arbitrary x(q), the solution can be explicitly given as a quadratic function in y, with g-dependent coefficients. Other 
analytic solutions we did not find for a general x(q), but of course for special x(q)-s, like step functions, the PPDE 
can be solved in closed form. 



E. Small field expansion 



The case of an overall small function <P(y) in (4.1) is of interest because, on the one hand, in the neuron problem 
it corresponds to the high temperature limit, and on the other, it will yield the usual energy term in several of the 
infinite range interaction spin glass models. The latter feature stresses the generality of the framework discussed in 
this paper. We can apply straightforward perturbation expansion by introducing a small parameter e and writing 

<p[e$(y), Q] - e <pi[$(y),Q] + e 2 (p 2 [$(y),Q] + 0(e 3 ), (6.34) 

where we took into account that the O(e ) term vanishes. The linear term is 

^[<%),Q] = i / *^ e **-*-Q-£#(y B ) = l£ r^y<p { yy*y-h 2 ^ 

nj [2n) n ^ n^J 2ir 

1 " r 

= -£ Vz$(zV^)> (6-35) 

U a=l J 

which, for q aa = 1, gives 

^[<%),Q] - Jdz${z). (6.36) 

In 0(e 2 ) we obtain 



,6=1 



l± J dXl A j0 dV2 #( yi )#( Itt )e to «'-3<-?«--^«+^-«.») - 2^. (6.37) 



In the last expression ixy is shorthand for i{x\y\ + x 2 y 2 ). When n — > the term \n^\ vanishes. 
In the generic case of q aa = qu = 1 we obtain after elementary manipulations 



1 " 



a,b=l 
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where 



&(q) = J DziDz 2 $(mz) <P(n 2 z) 



|"i| = \n 2 \ = 1, n x n 2 = q. 
Here ri\Z, etc. denote scalar products of two-dimensional vectors. In the continuous limit 

i r 1 

!P2[$(y),x(q)] =- dqx(q)W(q) 



(6.39a) 
(6.39b) 

(6.40) 



results, where (6.2) with q aa = 1 was used, thus the ter m ( 6.34 ) is hereby resolved up to 0(e 2 ). 
For the derivatives of ^(g), with the notation ( 4.105| ), we obtain the suggestive formula 

<f[ fc l(g) = [Dy 1 Dy 2 'pW(n 1 y)<pW(n 2 y), 



(6.41) 



together with the condition (6.39b). This yields a simple relation between appropriate expansion coefficients of the 
functions ^(y) and &(q). Namely, if 



9{q) = ^ 1 k 



3.42) 



k=0 



then applying (6.41) with (3.39b) at q = we get 

jr fc = I#M( ) = i jDy*W( y ) 
On the other hand, assuming that ^(y) is not diverging too fast for large \y\, we have 



J Dy<pW(y) = (-l) k J By-P(y)e^ 2 ^ 



(~l) fc 
2i 



Dy$(y)H k ( -2= 



where Hk(y) is the fc-th Hermite polynomial. Hence, given the Hermite expansion of <?(y) as 
then using the orthogonality 



we have for the Taylor coefficients of *P(q) 



KW 1 \T2 



& k = k\2 k <Pl. 



= 2 fc k\ 6 k 



(6.43) 



(6.44) 



(6.45) 



(6.46) 



(6.47) 



In conclusion, for a given analytic ^(q), with nonncgative Taylor coefficients, we can thus construct a ^{y) that 
reproduces ^(q) through the expression ( |6.39a ). The correspondence between \P(q) and <P(y) is not one-to-one, 
because all $(y)-s with Hermite coefficients ±<Pk will yield the same ^(q). 

The expression (6.38) is the ubiquitous form for the energy term in various SK-type models, 



/ e (Q) = AJ2 



(6, 



a,b=l 



where A is a prefactor depending of the model. In particular, we have in the cases of the SK spin glass 14 1, the 
p-spin interaction [ p3[ , and Nieuwenhuizen's multi-p-spin interaction model j pSj j , for ^(q) the functions q 2 , q p , and 
f(q), respectively. (The corresponding formula with \P(q) = q 2 , in the SK replica free energy, is the second term in 
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Eq. ( |2.8| ).) The multi-p-spin interaction model, which incorporates the Ising andp-spin as special cases, has the p-spin 
component entering through the characteristic exchange constant J p and leads to 



p=l 



whence 



Given <P(y), 



m = E 



=J ^(P-I)! 



JO 



A(Q)= n A^(j,),Q] 



(6.49) 



(6.50) 



(6.51) 



-=o 



relates the spin glass energy term to the general framework expounded in this chapter. Due to the fact, that the 
Taylor coefficients of \P(q) in a multi-p-spin interaction system are necessarily non-negative, a given &(q) uniquely 
determines the corresponding multi-p-spin interaction model. 

Whereas for the evaluation of the free energy term ( |6.34 ) the usage of PDE-s could be avoided, we invoke the 
auxiliary q- and y-dependent fields for the calculation of expectation values. Since now the initial condition of the 
PPDE (4.36) and thus the solution of it, tp(q, y), is of 0(e), in lowest order the nonlinear term in ( [4.36 ) can be omitted. 
Writing ip(q,y) « ecpi(q,y), and using similar notation for the derivative fields /i(q,y) and n(q, y), we obtain linear 
diffusion equations for the fields fi{q,y), Pi(9, y), and Ki(q,y). Hence 



<pi(<i,y) 

Ki(q,y) 



dy 1 G(y-y 1 ,l-q) 
dy 1 G{y-y 1 ,l- q) $'(yi), 
d yi G(y- yi ,l-q)$"(yi). 



The GF (4.70) is in leading order a Gaussian 

G<p{q2,yr,qi,yi) = G{y x - y 2 ,qi - 92) + 0(e), 

and 

P(q, V) = a„(0, 0; 9, y) = G(y, q) + 0(e), 
Thus the two-replica correlator (we only treat here the q < 1 case) is by Eq. ( [5.33] ) in leading order 



(6.52) 
(6.53) 
(6.54) 

(6.55) 
(6.56) 



C£\q) = 



G(yi,q) G(yi -y 2 ,l~ q) ^(ife) G{yi - 2/3, 1 - q) ^'(ya) 



e 2 4r{q). 



(6.57) 



From the no n-neg ativity of the Taylor coefficients of \P(q), s ee Eq . (S.49), it follows that C x (q) > 0. The replicon 
spectrum of (5.63) can also be evaluated by our noting that (5.65) is now 



MluViiQi) = e Ki(qi,yi) + 0(e 2 ), 
independent of 92 , whence in leading order 



(6.58) 



A (91,92,93) = e 5 



Gfcft, q x ) G{ Vl - 2/2, 1 - 9i) #"(j/ 2 ) 



xG(yi-y 3 ,l- qi )$"(y3) 

= e 2 <P(qi). 



(6.59) 



Due to the non- negativity of the Taylo r coefficients in Eq. (S.49) we have A (91, 92, 93) > 0. Comparison with 



6.57) 



shows immediately that the WTI ( 5.69 ) is satisfied. The eigenvalues associated with the SK-type energy term ( 6.4£ ) 
are obtained, based on (3.51), as 2A^(qi). 
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VII. THE NEURON: SPHERICAL SYNAPSES 



Having worked out the technical tools in the previous sections, we are now in the position to apply them for the 
storage problem of the McCulloch-Pitts neuron. 



A. General results 

1. Free energy and stationarity condition 

The free energy ( 3.17 ) can be resolved based on the results of Sections |y|, 0, and VI with the substitution 

<%) = -pV(y). (7.1) 

The specific formula for the free energy is one of our main results, so however elementary the above substitution is, 
we collect the relevant expressions below. Introducing the field 



f(q,y) = -P l <p(q,y), 



(7.2) 



we obtain, from Eqs. (4.2) and (4.38), the energy contribution to the free energy term (3.17d), as a functional of the 
OPF x(q) 



1 



f e [x(q)] = lira l ~/ e (Q) = -/rV[-/9V(tf)>Q] L-o = /(°'°)' 
n— >o n 



where, from ( 4.36 ), the f(q,y) is the solution of 



f(l,y) = V(y). 

The analog of the function n(q, y) of Eq. ( 4.45| ), useful for the calculation of replica correlators, is now 

m(q,y) = d y f(q,y) = -p~ x n{q,y) 



and from (4.46) we get 



By introducing 



we obtain 



d q m = —7;dym + (3xmd y m, 
m(l,y) = V'{v). 



x(Q,y)=9yf(q,y) = -/3 1 K(q,y) 



d qX = - \ d lx + f3x(mdyX + X 2 ), 

x(hy) = v"(y). 



The q-dependent probability density P(q,y), satisfying the SPDE (4.53 4.54) now obeys 



d q P^\d 2 v P + (3xd y {Pm), 
P(0,y)=5(y). 



The entropic term ( |3.17c ) has essentially been calculated through the formula ( 6.14 ), whence we have 

f s [x(q)\ = I™ -/s(Q) - ~ f dq -L- 
n^on 2/3 J Q [D(q) 1-q 



(7.3) 



(7.4a) 
(7.4b) 



(7.5) 



(7.6a) 
(7.6b) 



(7.7) 



(7.8a) 
(7.8b) 



(7.9a) 
(7.9b) 



(7.10) 
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The specific form of the stationarity condition (3.21) immediately follows from (6.18' 



) and (||||) as 



af} 2 / dy P(q,y) m{q,yf 



(7.11) 



This equation holds at isolated q r -s in an i?-RSB scheme, and does so identically in an interval where x(q) > 0. The 
question of plateaus with value in (0,1) will be effici entl y treat ed b y the variati onal f ormalism of Section VII A 3. 
The stationary OPF x(q) should be substituted into (7.3) and ( 7.10 ), which by ( |3. 17b ) sum up to the value of the 
thermodynamical free energy 



/ = fs[x(q)] +af e [x(q)], 



whose ingredients we redisplay as 



1 



fs[x(q)] = 2f3 

=/(o,o) 



dq 



D(q) 1 - q 



(7.12) 

(7.13a) 
(7.13b) 



The distri bution of local stabilities, introduced in ( 3.27 ), is an expectation value of a type previously calculated. Using 
Eq. (5.11) we have the simple result 



p(A) = ((S( Vl - A))) = J dy P(l, y) S(y - A) = P(l, A). 
The energy can be directly obtained from this distribution by Eq. ( 3.29| ) as 



((V(yi))) 



dyP(l,y)V(y). 



(7.14) 



(7.15) 



The average of any function of A is, in general, the function's average over the distribution P(l,y). This shows the 
physical meaning of the auxiliary variable y within the Parisi framework: it is the stability parameter, extended to 
any intermediary stage q, o beying a distribu tion P{q, y), that becomes at q = 1 the physically observable distribution 
P(l, y). The entropy is by (frl 7T§ |7t|) 



1 

s = — 
2 



dq 



1 



D(q) 1 - q 



af3 



dyP(l,y)V(y)-f(0,0) 



Given the monotonicity of the OPF, the Edwards- Anderson order parameter ( 3.31 ) can be cast as 



<?EA = maxg(x). 

X<1 



(7.16) 



(7.17) 



This is the maximal q that has non- vanishing probability, P(q) = x(q) > 0, in the notation of Section IV B 1 we have 

9EA = <?(1)- 

In summary, as we demonstrated it in Section VI A| , x(q) = J* dq P(q), where P{q) is the probability density of the 
overlap q between two synaptic configurations. Thus x(q) is monotonous and invertible with inverse q{x), allowance 
given for plateaus and isolated discontinuities in these functions. The conclusion of the present section is that the 
equilibrium properties of the neuron model are determined by the stationary shape of x(q), or its inverse q(x), thus 
they play the role of order parameter function, in close analogy to spin glasses |lJ,Q . 



2. Variational principle: the PPDE as external constraint 



In Section |VII A 1 we have given specific forms for the free energy and stationarity conditions of Section III, for the 
case of the Parisi ansatz. Those formulas were originally expressed in terms of the Q matrix, while Section VII A 1 has 
the field /(q, y), obeying the PPDE. It is natural to ask, what happens if we express the free energy in terms of x(q), 
and look for its extremum by varying x(q). This is reversing the order of the original r ecipe , when the stationarity 
condition in terms of the elements of Q was taken first and the resulting formula, Eq. ( |3.2l| ), expressed in terms of 
x(q). The equivalence of these two procedures has been seen in the cases R = 0, 1 for spin glasses (see e. g. Q) and 
the neuron 0J§,0|. It is our observation that the equivalence carries over to the continuous Parisi ansatz. The proof is 
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in principle given by Eq. ( |5.4(: ), an identity which tells us that the variation by x(q) is proportional to the two-replica 
correlation, obtained by differentiation by q a b- We will, however, not leave the matter there and give a self-contained 
presentation of the variational theory. 

We shall consider two approaches. In this section the PPDE will be maintained as external constraint, while in the 
next one it will be included by a multiplier field into the functional to be extremized. The variational formulation 
opens the way to alternative methods to find stationarity states. I ndeed , given the variational free energy to be 
extremized, we are no longer bound to the stationarity prescription (7.11) for finding the cxtrcmum, rather we can 
choose any suitable procedure that is capable to locate the extremum of a functional. 

The free energy is then 



with the free energy functional 



/ = max / [x{q)\ , 

x(q) 



f[x(q)]=f s [x(q)}+af e [x(q)} 



(7.18) 



(7.19) 



as defined by (7.13). The maximization in terms of x (q) is a transfiguration of the original minimization by the matrix 
elements of Q due to the n — ► limit. The PPDE (7.4a) is understood as external constraint, and in what follows 
its solu tion, and in fact the solutions of the related PDE-s of Sections IV B 2 , IV B 3 , as well as the GF-s of Section 
IV B 4 , are assumed to be known. 



Variation of the free energy gives, following the result of Section VB1, the sum of the two-replica correlations for 



t he ent ropic and the energy term. In fact, in the special case of the spherical entropic term, the functional derivative of 
( 7.13a ) can be straightforwardly calculated. This gives, of co urse, the same result as t hat obtained from the correlator 
(3.31). Concerning the energy term, we write the corre lator ( 5.33[ ) with the notation (7~5|). We recall that the entropic 
term was related to the generic free energy term by ( |6.26 ) while the energy term ( 7.3|) also involved a minus sign. 
Finally, applying ( 5.45| ) to both the entropic and the energy term, we get for q < 1 



Sf[x(q)} (3 
Sx(q) 2 



f(v, [*(<?)]) = 



Hq, [*(<?)]) 

dg 

o P 2 D{q) 2 



2ft 



Ci 2 l(q)-aC^(q) 



~a &yP{q,y)m{q,yf 



(7.20a) 
(7.20b) 



Note that we never displayed the functional dependence on the OPF in th e corr elators, but are doing so in the 
functional derivative for clar ity. F urthermore, in the second correlator in ( 7.20a ) the subscript e signals that it 
comes from the energy term ( 7.13t| ), nevertheless, we omit that subscript from the related fields P(q,y) and /i(q,y), 
introduced in the previous section. 

When x(q) can be freely varied, the stationarity condition is 



F(q, [x(q)])=Q, 



(7.21) 



thus (7.11) is recovered. In the case of stationarity for a discrete -R-RSB scheme (4.5), the vanishing of (5.44) at each 
q r , r = 0, . . . , R is required. This, however, gives only R + 1 equations, insufficient for the determination of all x r -s 
and q r -s. If variation by x(q) is made with t he as sumption that x{q) = x in an interval /, where < x < 1, that is, 
there is a nontrivial plateau in /, then from (5.47) follows 



dqF(q,[x(q)]) = Q 



(7.22) 



as stationarity condition. Thus ( 7.22] ) should hold in each interval (x r ,x r +i), r = 0, . . . , R — 1, within an i?-RSB 
scheme. If the stati onary OPF has both x(q) > and x(q) = x ^ 0, 1 parts in some intervals, then these imply the 
usage of ( 7.21 ) and ( [7.22 ) in the respective intervals of q, and ( 7.21 ) at the jumps between plateaus. We will see that 
such a phase, characterized by an x(q) concatenated from a plateau - with a nontrivial plateau value - and a strictly 
increasing segment, does arise in the neuron. 



3. Variational principle: inclusion of the PPDE 

Sommers and Dupond ]29[ ] introduced a variational formalism for the Ising spin glass by including the PPDE into 
the free energy functional with a Lagrange multiplier field. The latter turned out to be the field satisfying the SPDE, 



G3 



and it could be interpreted as the probability density of the local magnetic field. The free energy functional also 
depended on and needed to be varied by the an auxiliary function A(x). The latter function turned out not to bring 
new degrees of freedom in play because of an additional relation between q(x) and A (a;). In contrast to former studies 
of the SK H and Little-Hopfield fl| models, we did not find it necessary to introduce an additional function, the 
analog of A(x). The reason for that is, we surmise, that we ha d chosen x(q) as order parameter function. That has an 
immediate physical meaning, as demonstrated in Section VIA , thus no allowan ce re mained for the "gauge" invariance, 
inherent in the traditional approach p9] . Moreover, the continued spectrum ( |6.9| ) of the Q matrix turned out to be 
proportional to the auxiliary function A(x(q)) of Ref. |^| (the cited authors also found this relation), so introducing 
the latter as an independent field to be varied does not lead to technical simplification. It should be emphasized that 
the "gauge" invariance appeared to be of limited significance only when the stationarity criterion was studied. It is, 
however, of import as to fluctuations of Q violating Parisi's ansat z an d is the source of the WTI-s ]226| . 

Following Sommers and Dupond, we shall use the condition (7.4) in the free energy functional as a constraint. 
Forcing the PDE (7.4a) gives rise to a Lagrange multiplier field P(q,y), while the initial condition (7.4b) should be 
set separately. The result is 



/= max extr / [x(q), f(q,y), P(q,y)] , 
x(q) f(q,y),P(q,y) 



with, on the r. h. s., the functional 



/.[. 



= /„[...] + a(/ e [.. .] + /«[.. .]+/<?> [...]), 



/(0,0), 



D(q) l-q 



dq / Ay P(q, y) d q f(q, y) + \d 2 y f(q, y) - \(3x(q) (d y f(q, y)) 



= / dyP(l,y) [V(y)-f(l,y)]. 



(7.23) 

(7.24a) 
(7.24b) 
(7.24c) 
(7.24d) 

(7.24b) 



The functional dependence on appropriate arguments is marked by [...]. 

There is no physical restriction on the type of extremum in terms of the auxiliary fields f(q,y) and P(q,y), we 
keep, therefore, the more general "extr" condition. The auxiliary functional f a X \- ■ ■} enforces the PDE ( 7.4a ). The 



form of fa [■ ■ ■ ] can be understood if we impose the initial condition on the PDE by adding the term 

6(q-l)[V(y)-f(q,y)} 



(7.25) 



to the 1. h. s. of ( 7.4a ). The ambiguity of the Dirac delta centere d at q — 1 can again be taken care of by our using 
S(q — 1~°) whenever necessary. Note that the sign of expression ( 7.25| ) matters, it is the above choice that forces the 
right initial condition no matter what f(l,y) was before. This feature can be shown by considering an infinitezimal 
decrement in q from 1 in the PDE complemented by ( |7.25| ). The two auxiliary terms (7.24d) and (7.24e) can be 
concatenated an d vari ation by P(q,y) gives the PDE Q7.4| ) for f(q,y), initi al condition included. For t he sake o f 
clarity we keep (7.24e) specifying the initial condition separate. The terms (7. 24b, 7. 24c) are identical to (7.10.7.3). 
respectively. 

Given the constraint on f(q,y) by the Lagrange term, one shou ld v ary f(q,y) independently, yielding the PDE 



(7.9) for P(q,y) including the initial condition, with the notation (7.5). Variation by x(q) can then be done while 
f(q, y) and P(q, y) are kept fixed, and we find that 



Sf[x(q),f(q,y),P(q,y)] 
Sx(q) 



(7.26) 



is equal to ( 7.20 ). It should not cause confusion that the free energy functional /[...] and the auxiliary field f(q,y) 
have the same symbol, because the argument tells the difference. The variational free energy with the PPDE included 
as constraint was one of our main results in Ref. |17|. 

It should be emphasized that while the variational formalism is very useful for the description of the equilibrium 
properties, it does not account for such fluctuations of the matrix elements of Q that cannot be captured by the OPF 
x(q). Thus in order to study thermodynamical stability we need to resort to the more general framework of Section 
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4- On thermodynamical stability 



Based on the formulas derived in Section we can give an explicit expression for the replicon spectrum in terms 
of q, y-dependent fields. We will only treat explicitly the spherical neuron, generalization for arbitrary independent 
synapses is, in principle, straightforward. 

The free energy, as function of the Q matrix, is the sum of the entropic and the energy terms. Due to the fact that 
both undergo the same scheme of spontaneous RSB, their Hessi ans ca n be simultaneously quasi-diagonalized, based 
solely on the ultrametric symmetry of the Hessian (see Section VC1). This results in the longitudinal- anomalous 
sector (R + 1) x (R + 1) matrices in the diagonals, and in the replicon sector the replicon eigenvalues as diagonal 
elements. Hence a replicon eigenvalue of the complete Hessian is the sum of the two eigenvalues, one from the entropic 
term and one from the energy term. We do not deal with the longitudinal-anomalous sector in the general case, mostly 
because complete diagonalization there depends on the s pecific sy stem under consideration. 

The entropic eigenvalue has been calculated in Section VI D as (|6.33| ), so we have 



A s (gi,g 2 ,<73) = [D(q 2 )D(q 3 )Y 



(7.27) 



In order to get the contribution from the energy term, we introduce the GF for the field /. Using the fact that / and 
tp are proportional we obtain 



G (quinism) = 



8 finite) 8ip(q 2 ,y 2 ) 



Gcp(qi,yi,q2,y2)- 



(7.28) 



Note that on the 1. h. s. we omitted the subscript / that we consider the d efault. Hence by (4.85) we obtain the vertex 
function T. Then the eigenvalue from the energy term is given by (5.66) with the substitution n{q,y) = —f3x{q, y), 
where x(<?,j/) satisfies the PDE ( 7.8a ), yielding 



A (<?i,<72,<73) = -fl 2 J dy 2 dy 3 r(q 1 ;0,0;q 2 , y 2 ; q^VsIX {Q2,V2) X (93,1/3) 



The final formula for the replicon spectrum is thus 

A(gi,gj2,?3) = X s (qi,q 2 ,q3) +aA c (<ji,<72,g 3 ) 
Note that here the solutions of the relevant PDE-s were assumed to be known. 



(7.29) 



(7.30) 



The WT I disc ussed in Section V C 3 implies the existenc e of z ero modes. Indeed, using the fact that the functional 
derivative (7.2C) is made up of two-replica correlators, by ( |5.69 ) we have 



\(q,q,q)=f3 2 F(q, [...}) = X(q) 



(7.31) 



as the WTI for the spherical neuron. Here the dot means derivative in terms of the explicit (/-dependence. But 
stationarity for strictly increasing segments of x{q) means the vanishing of the r. h. s., so the ei genva lue for such g-s 
is zero. Note that in an i?-RSB scheme stationarity at the q r -s does not imply the vanishing of ( 7.31 ). Based on the 
interpretation, quoted in Section V C 3, of the WTI as a consequence of spontaneously broken permutation symmetry, 



the zero modes found here can be considered as Goldstone modes of the symmetry broken phase. 

In order to decide about thermodynamical, linear, stability of a stationary x(q), the analysis of the full replicon 
spectrum is necessary. 



5. Main types of the OPF 

It has been the experience in the study of various long range interaction disordered systems that only a few main 
types for the OPF x(q) satisfy the stationarity condition and are at least marginally stable at the same time [pi| . 
Below we review those that appear in the storage problem. 



The i?-RSB ansatz (see Eq. (4.21) proved to describe thermodynamical equilibrium for R = and R = 1 in several 
different systems in some parameter range. The former is the RS, the latter the TRSB state. Interestingly, we have 
not found any examples in the literature when i?-RSB with R > 1 would have described thermodynamical equilibrium. 
In the storage problem Whyte and Sherrington |ll|] have shown that at T = all finite i?-RSB solutions are unstable. 
In fact, the eigenvalue causing instability is — oo, a typical T = phenomenon, also observed for such eigenvalues in 
the SK model. 
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As to CRS B sta tes, the shape of the OP F tha t corresponds to the phase discovered by Parisi in the SK model is 
displayed in ( 4.44 ). In the nomenclature of [ 229 this is the SG-I state. 

Another type of phase also arises in the storage problem, namely, a concatenation of a 1-RSB plateau and a strictly 
increasing segment of the OPF. This has the form 



x(q) 



i if <?(i) < q < i 

x c {q) iiqi<q< g ( i) 

x\ if q (0 ) <q<qi 

if < g < g (0) , 



(7.32) 



Such OPF has been observed in spin glasses with spins of more th an tw o states, like the Potts model |2q] . This type 
of continuous OPF with a plateau has been termed SG-IV in Ref. |229| . 



6. Stationarity and its consequences 



The stationarity conditions displayed in Se ction VII A 2 can be ca st in more useful form s. First of all, note that 
also the entropic term is of the generic fo rm ( 4.1), as shown in Eq. ( 6.23| ) of Section VI D . We will thus formulate 
stationarity in terms of the correlators in (7.20b). Th e field s for the energy term will not be labeled, while the fields 
belonging to the entropic term and treated in Section |VI D| will carry now the subscript s like tp B , /i s , k s , and P s . 

The stationarity conditions for the regions of positive P(q) can be cast into an equation that holds for all q-s as 



dqx(q) F[q,x(q)} = 0. 



(7.33) 



where F was defined in Eq. ( 7.20p . Indeed, F = must hold unless P(q) — x(q) — 0. The lower limit of integration can 
be safely chosen to be zero. Alternatively, we have a combined stationarity condition that contains the requirements 
about plateaus and smooth x c (q) segments, but can be imposed only at q-s where P(q) > 0, namely 



dqx(q)F[q,x(q)] = 0. 



(7.34) 



Next we summarize a few identities that follow from the PDE-s of Section VII A 1 for the fields in the energy term 

1 



_d 
dq 
d_ 

dq~ 



dyP(q,y) f(q,y) 



0x(q) / dyP(q,y)m(q,y) 



(7.35a) 
(7.35b) 
(7.35c) 
(7.35d) 



Similar identities among the fields tp B , fj, s , k s , and P s of the spherical entropic term can be naturally obtained, when 
the factors —(3 are erase d as well. 

Note that the PPDE (7.4) was used in deriving ( 7.35a ). Ac cordin g to what has been said in Section [VBC, for a 
discontinuous potential V(y) the PPDE is invalid at o = 1 so (7.35a) holds only for q-s where f(q,y) is smooth in y, 
generically for q < 1. Let us consider the integral of ( 7.35a ) 



J dyP(q,y)m(q 7 y) = 0, 
j- J dyP(q,y)x(q,y) = Px(q) J dy P(q,y) x(q,y) 2 , 
— J dyP(q,y)m(q,y) 2 = J dy P(q,y) x(q,y) 2 ■ 



dyP(q,y)f(q,y)-f (0,0) 



/ x (q) I dy P(q,y)m(q,y) 2 



(7.36) 



Suppose that the PPDE ( 7.4a ) holds for any q < 1, furthermor e, th at both sides are continuous in q at q = 1, a 
condition that is met if /3 is finite. Due to the first assumption ( 7.36] ) holds for any q < 1, and due to continuity it 
does so also at q = 1. 

First we consider (7.33). After partially integrating it, and recalling (7.20) where F was a sum of two correlation 
functions, we can use the relations (7.35c, 7.35d) to express the x(q)F term as a derivative in q. The result is 
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(5 [ dqx(q)F[q,x(q)] = (3x{q) F[q,x(q)] 
Jo 

dy [P~ 1 P s {q, y) K B (q, y) + a P{q, y) x(q, y)} 



0. 



(7.37) 



The subscript s refers to the fields related to the entropic term, discussed in Section VID. It follows from Eq. (6.27) 
that 



K s {q,y) = d v n s {q,,y) = 



D(qY 



hence 



0x{q)F[q,x(q)] 



0D(q) 



t J dyP(q, y)x(q, y) = - a x(0, 0), 



(7.38) 



(7.39) 



wh ere w e took into account that P(0,y) = S(y). Obviously, for P{q) = x(q) > the first term on the 1. h. s. vanishes 
by (|7.21| ), thus the rest is constant for such q-s. Note that this constant is nontrivial, in contrast to some spin models 
p9],Bl||, wher e the analogous constant vanishes. 



The form (|7.34| ) immediately suggests the use of (|7.36| ) and yields for any q with P(q) > the following form for 
the free energy 

/ = /3-Vs(0,0)+a/(0,0) = /r i y dyP s (q,y)<p s (q,y)+a J dy P(q,y) f(q,y). (7.40) 

This equation remains true in the interval [0, qto)], whe re x iq) = 0. So there the fie lds P and / are mutually adjoint. 
Substituting q = we get the standard expression ( 7.12| ). From Section VID, we can recalculate the spherical 
contribution 



1 / dyP s (q,y)cp s (q,y) 



W Jo 



D{qf 



(7.41) 



that can be substituted into ( 7.40 ) to yield a more useful formula. Note that ( 7.4C ) is not an alternative form for the 
free energy functional, rather an expres sion for the free energy at stationarity. 

We mention that differentiations of (7.11) yield further stationarity conditions, valid only in intervals, but not at 
isolated points, where P(q) > 0. We display the first one 



F[q,x(q)} = 



1 



f3 2 D 2 (q) 



- a / dy P(q, y) x(q, yf = 0, 



(7.42) 



where ( 7.35d ) was used. The same formula, without becoming zero, is useful in i?-RSB schemes. By Eq. ( 7.31 ) it 
represents a replicon eigenvalue with coinciding q arguments 



X(q) = 



D 2 (q) 



- a/3 2 / dyP(q,y)x(q,y) 2 



(7.43) 



For R — (RS solution) this is at q — qo the AT eigenvalue, for R = 1 it gives at q = q\ the typically most dangerous 
eigenvalue, responsible for the destabilization of the 1=RSB state. 



7. The entropy 



Based on the identity (7.36) the entropy (7.16) can be cast into the alternative form 



dq 



D(q) 1 - q 



\af3 2 j dqx(q) / dy P{q,y)m{q,yf 



(7.44) 



This is valid when ( [7.36 ) can be extended to q = 1, for example at finite temperatures. 
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It is useful here to se parate the interval for ^-integration into (0, qm) and (?(i), 1)- Consider the first term —f3 f s [x(q)] 
on the r. h. s. of ( fr.44|) 



P f s [x(q)} = dq^+ ln(l - 

rUD ri dq 



'/(l! 



dq 



a/3 



o D{qY 
dq, x{q) I dyP(q,y)m(q,y) 



o Jo 

2 / J„ „,/„\ / A„.TDl„ „.\™/„ „.\2 



+ ln(l-9(i)) 



+ (1 - q (1) )af3 2 dyP{q,y)m{q 7 y) 2 



(7.45) 



The last relation comes from the alternative stationarity condition (7.34), which can indeed be applied, because for 
the upper limit of integration ffm, the Edwards- Anderson order parameter, we have P(q^) > 0. Substitution into 
( [7.44 ) leads to cancellation, thus 



s = I M 1 - 9(1)) + ^7- (! - 9(1)) / dyP(<7 (1) ,y)TO(g (1) ,2/) 2 



-5- / dg / dyP(q,y)m(q,yY 
1 Jim J 



(7.46) 



where x(q) = 1 was used in the third term. It follows from (7.35d) that J dy P(q,y) m(q,y) 2 , in general, strictly 
increases in q. But for an increasing function h(q) 



dqh(q) > (1 - %(!)), 



(7.47) 



9(1) 



therefore in (7.46) the second and third term together is generally negative. The first term is obviously negative, 
thus so is the entropy. The above formulation is useful, besides for the consistency check of negativity of the entropy, 
because the constituent functions are needed only in [q<m,l]. There the only nontrivial ingredient is P(q,y), for 
m(q, y) is explicitly given by a Gaussian integral over the known m(l, y). 



8. The high temperature limit 



At high temperatures, if the relative number of examples a is appropriately rescaled, the neuron exhibits nontrivial 
thermodynamical properties^. This should be contrasted with the fully connected SK-type spin glasses, which are 
paramagnetic in the high-T limit. 



For /3 — ► the en ergy term (4.2) can be expanded in terms of the potential and the re sults of Section VIE apply. 
We identify in Eq. ( |6.34 ) e with f3 and <&(?/) with —V(y). In analogy with the definition (6.39) we introduce 



W(q) = y > Dz 1 Dz 2 l/(n 1 z)l/(n 2 z), 
|ni| = 1, nin 2 = q. 



"l 



The energy term in the free energy functional is expanded as 

f c [x(q)] =f e0 + Pfel[x(q)] + O(p 2 ), 
where from Eqs. (gj), and ( ^3^ ) we have 



/ e0 = / BzV(z)= ^/W(0), 



(7.48a) 
(7.48b) 

(7.49) 
(7.50) 



As it has been pointed out to the author by M. Opper, the limit studied here is equivalent to the thermodynamics of a n 
TV- dimensional vector in a Gaussian random, quenched, potential, with variance characterized by the function W(q) of (7.48). 
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which does not depend on the OPF x(q), and 



fei[x(q)} =-- 



dqx(q)W(q). 



The relative number of examples a should be scaled so that 

7 = a/3 2 



(7.51) 



(7.52) 



remains finite. Large as will counterbalance the homogenizing effect of high temperatures. The full free energy 
functional is thus singular in the small f3 limit such that 



f3f{x(q)]=fr 1 <f )0 + <f )l {x(q)}+O(f3), 



where 



ii 



D(q) 1 - q 



■lx(q)W(q) 



(7.53) 

(7.54a) 
(7.54b) 



The entropic contribution was inserted from Eq. ( |7T0| ). The /3" 1 4>o is singular for j3 — > but is independent of the 
OPF x(q), thus it does not lead to meaningful thermodynamics. The important feature here is that the third term in 
( 7.54b| ) is linear in x(q), because expansion in (3 is equivalent to expansion in x{q) in the PPDE (7.4a). 

The term </>i [x{q)\ is equivalent to the free energy functional of Nicuwcnhuizcn's spherical multi-p-spin interaction 
spin glass, a most general SK-type spherical system, incorporating the spherical SK and the more general p-spin 
Note that the above result can be obtained also by solving the PPDE perturbatively in (3, a 



interaction models 16 
longer calculation. 



Variation of the 0((3) term of ( 7.53 ) gives 

5x(q) 



D{qf 



7 W(q) = Fo(q, [x(q)}), 



(7.55) 



the leading term in formula (7.20b) for f3 — » 0. In intervals with x{q) > the stationarity condition Fo(q, [x(q)]) = 
can be explicitly solved for the segment x c (q) of the OPF to give 



x c (q) 



W(q) 



2 7 l/2^( g )3/2 



(7.56) 



If at a q r the stationary OPF exhibits a step then Fo(q r , [x(q)]) ~ holds, and for a plateau x(q) = x of value 
< x < 1 in the interval / the condition ( 5.47| ) should be applied. 

The replicon eigenvalues can be easily calculated by our adding the contributions from the entropic term ( |6.33 ) 



A s o(gi,92,g3) = [D{q2) D(q 3 )}~ 



and the result from the expansion ( 6.59 ) applied to the energy term, 

aX c0 (q 1 ,q 2 ,q 3 ) = -~fW(qi), 
Thus in leading order A e depends only on one g-variable. Adding them up gives for the replicon spectrum 

Ao(<?l,92,<73) = X s0 (q 1 ,q2,q 3 ) + aX c0 (q!, q 2 , q 3 )- 



(7.57) 



(7.58) 



(7.59) 



This the leading term in ( 7. 30] ). If q\ falls into an interval where x(q) > then by (7.56) we have 'yW(qi) = l/D(q{) 2 . 
Since qi < (72,93 < 9(i) and D{q) monotonically decreases, it is easy to see that Ao(<Zi, (72, 93) > 0. So these replicons 
are never linearly unstable. Such a general statement cannot be made if q\ is a discontinuity point between two 
plateaus of x(q). Longitudinal stability - we have not discussed this question in the general case - can also be checked 
explicitly in the high-T limit. Indeed, the second variation of the free energy functional gives 



S 2 Mx(q)} 
Sx(q 1 )5x(q 2 



1(91 m 



dq 
D{qf 



(7.60) 
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This is negative definite, as shown in App. ||, so the extremum of f[x(q)] is indeed maximum as required in (7.23). 

The main local quantity of i ntere st is the stability A associated with individual patterns. The distr ibution o f 
stability parameters p(A) is by (7.14) determined through the field P(q,y), so we have to solve the SPDE (4.53,4.54) 
for high temperatures. As described in Appendix O, we find 



P (y) = P o(y) + P P i(y) + 0(p 2 



(7.61) 



where 



p (y) = G(y, 1) 



Pi(y) = d y G(y,l) / dqx(q) / DzV'(yq + zy/l - q 2 ) 



(7.62a) 
(7.62b) 



The first correction p\ shows the deviatio n fro m the Gaussian, an effect that is expected to be dramatic for low 
temperatures. The error per pattern is by (3.29) 



s = e + f3e 1 + O{f3 2 ), 



(7.63) 



where 



e = / 6vt>o(v)V(y) = y/W(tij, 



El 



dypi(y) v{y) 



dqx(q)W{q). 



(7.64a) 
(7.64b) 



The leading term of the entropy is obtained from the definition (3.16) as 



50 =2 



dq 



1 



1 



D(q) 1 - q 



-jx{q)W(q) 



(7.65) 



9. Scaling by temperature 

Gardner and Derrida recognized that at T — 0, for positive stability threshold k (see Eq. ( j^ ) for the definition), 
when the limit of capacity was approached, the RS order parameter q converged to unity ^,|| . This is a manifestation 
of the fact that, at the limit of capacity, the volume of version space, compatible with the patterns to be stored, no 
longer diverges exponentially in N ]l9| . In other words, the volume per synapse goes to zero if N — > oo, accompanied 
by the divergence of the entropy to — oo. Discrete i?-RSB calculations at T — 0, beyond capacity, showed that the 
q-s belonging to any < x < 1 were also equal to unity. At the same time, q < 1 values were associated with the 
variable £ = fix, when this was kept finite in the limit x —* and (3 — > oo. Furthermore, it is plausible to assume that 
for T — > 0, i. e., — > 1, a positive limit of xr — > x^x) = x (q^) < 1 exists. 

The above observations suggest a natural scaling for the OPF, valid for any temperatures, but providing a smooth 
T — > limit. Let us introduce 

q(t)=q {1) -(q {1) -q {Q) )(l-{l + q {1) )t + q {1) t 2 ), < t < 1, (7.66a) 
ffi=0x(q(t))q(t), (7.66b) 
r? = /3 (1 - q (1) ) , (7.66c) 

A(i) = pD{q{t)) = ( m^i+ri, (7.66d) 



where the subscripted q-s are defined in (4.42), x(q) vanishes for < q < q^ and x(q) = 1 for 1 > q > qm- The 
time variable is changed to t via the invertible function q(t), that was constructed so that q(l) = for gm = 1. The 
scaled function A(t) is not to be confounded with the local stability parameter A. In the T — > limit we expect ij 
to be finite and thus the scaled OPF to be bounded. Indeed, in the most dangerous point, t = 1, i. e., q = qn), 
where flxfaru) may diverge, we have an expectedly finite 

C(l) = (<7(i) - 9(0)) < V- ( 7 - 67 ) 
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It is advisable to use the above scaling even for T > 0, because the scaled formulae remain manageable for small 
temperatures. In what follows q(t) may in fact be any monotonic function with boundary conditions q(0) = and 
q(l) = (7(1), our taking the simple ( [7.66a ) is just a numerically useful parametrization. 

The auxiliary fields now depend on t and y, like f(t,y), m(t,y), etc. Of course, f(t,y) equals f(q(t),y) and not 
the field f(q, y) in the point t = q. We will write the arguments in the way that no ambiguity remains about which 
function is meant. The PPDE should be rewritten as 



dtf(t, y) = -\q{t) d 2 J(t, y) + i C(t) (d v f(t, y)f 



(7.68) 



whose initial condition at t = 1 is the former f(qn), y). At this point it is worth displaying the f(q, y) in the interval 
[9(1)) 1]. There x(q) = 1, so Eqs. (7.1) and ( |4.35| ) define the Cole-Hopf transformed field, obeying linear diffusion, that 
gives 



f(q,y) = — In J Dze-^ + ^), q{1) <q<\. 



The initial condition of the PPDE ( fT6g| ) is 

/(* = = /(<?(i),y), 

whence for T — > 0, after change of integration variable in ( |7.69| ) as y = y + Zs/1 — q, we get 

(y- yf* 



f{t = l,y)=xmn[V{y) 



2>] 



(7.69) 



(7.70) 



(7.71) 



The expression on the r. h. s. first appeared in Ref. || as a free energy term in the RS approximation. For small q-s 
we have 



f(<l,y) = J Dz/(<7 (0 ), y + Zy/q {Q) -q) , 0<q< g (0 ). 



The rescaled SPDE reads as 



d t P(t, y) = | q(t) d 2 y P(t, y) + £(t) d y (P(t, y) m(t, y)) . 



(7.72) 



(7.73) 



Along the plateau [0,g(o)] we h ave x (q) = 0, so the SPDE ( 4.53 ) is a linear diffusion equation, and in [gi(i),l] the 
Cole-Hopf-type transformation (4.61) leads to linear diffusion, whence 



P(q,y)=G(y,q), 0<q<q {a) , 

P(q,y) = e-W*ri ( Dz P (t = 1, y + z y/q^q^) M^+'V^) , q {1) < q < 1. 



Thus the initial condition for P(t, y) in the SPDE is 

P(t = 0,y) = P(q {0) ,y) = G (y,q {0} ) 



The stationarity condition (7.21) for q-s where x(q) > 0, i. e., 

i{t)q{t)-mm >o 

reads now as 

1(0) 



A(oy 



dt4^L-a I dy P(t,y) m(t,y) 2 = 0. 



i) 



(7.74a) 
(7.74b) 

(7.75) 
(7.76) 

(7.77) 



Of course, if one has a nontrivial plateau within the ^-interval (0,1), i. e., ( 7.76 ) fails in a subinterval, then ( 7.77 ) 
is invalid in that subinterval and one should extremize by the parameters of the plateau extra. In the PDE-s and 
the stationarity condition the temperature does not appear explicitly and allows for a smooth limit in case T —> 0. 
Assuming that we solved the above PDE-s, in the scaled variables the free energy becomes 
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/ = fs + Otfe, 

1 q (0 ) 1 f 1 q(t)dt 



2A(0) 2J A(t) 2(3 rj- 
Dzf(t = 0,Zy/q~(p)). 



The distribution of local stabilities, based on (7.74b), is 

p(y) = P(q=l,y) = e-™ J Bz P (t = 1, y + z y/1 - efif(*=^v+*y/^w) . 



(7.78a) 
(7.78b) 

(7.78c) 
(7.79) 



It is straightforward to show that p{y) is normalized if P(t = \,y) was normalized, and the latter property follows 
from the fact that the SPDE preserves the normalization of its initial condition (7.75). The mean error per pattern 
can be calculated as 



dyp(y) V(y). 



(7.80) 



In pr actica l cases the limit of the local stability distribution for T — > can be calculated by the saddle point method 
from ( 7.79Q and contains only nonsingular, scaled variables. 

Concerning the entr opy, it is obvious from (7.44) that beyond capacity, at T = 0, the entropy is s = — oo. Indeed, 
the first term o f ( 7.46 ) goes to — oo for (3 — > oo, and the rest being generally negative, see reasoning in the end of 
Section VII A 7 , it cannot compensate for the negative singularity. This complements the known result for k = that 
when one approaches the capacity from below then the entropy diverges to — oo Together with the fact that the 
overlap beyond capacity equals 1 with probability 1, this demonstrates freezing in the grou nd st ate. This is an effect 
analogous to the vanishing of the T = entropy beyond capacity for the Ising perceptron [174], both show that the 
number of states with minimal error is subexponential in TV. 



10. The RS state and storage below capacity 



For a general potential V(v) that vanishes beyond a certain stability parameter, y > n, and is positive below it, the 
original results of Gardner j|]4[] describe the storage problem at T = below capacity. The reason for that is that if 
all examples are satisfied then the positive part of the error measure does not matter. 

At finite temperatures the potential comes into play, the equations are easily obtained from what has been said 
before. The free energy is a function of the only variational parameter q = qo = Qr = qm) = qm as 



f(q) = fs(q)+af e (q), 



PfM = g ( M 1 - 1) + 



-Pfeiq) 



Djsiln / T)z 2 e~ f3V(c \ 



C = ziy/q + z 2 \ / l - ■ 



Stationarity is given by (7.21) that now reads as 

q 



P A F(q) 



{l-qf 



a 



Dzi 



d_ 



In / Dz2 e 



0. 



with the abbreviation ( 7.81d ), and the AT eigenvalue from ( 7.45 ) is 

1 



A(g) 



(i - if 



d 2 



Dz 2 e 



(7.81a) 
(7.81b) 

(7.81c) 
(7.81d) 

(7.82) 
(7.83) 



Note that ( 7.31 ) is not in contradiction with X(q) ^ 1 F(q) here. This is because in ( 7.31 ) the derivative is understood 
by the explicit q dependence of F, while x(q) is fixed, but in F(q) both kinds of arguments are denoted by the same 
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q. Here X(q) is only meaningful at the stationary q. The probability density of local stabilities is given by say (7.79), 
now 



p(y) 



Dzi 



G(y + ziv 7 ! -q,q) 

jDz 2 e-P V(y+(z 1 +z 2 )VT=q) 



(7.84) 



In the ground state (T = 0) below capacity the positive part of V(y) is suppressed. This means that for — > oo 
only arguments of V matter that are greater than k. Closer inspection shows that this reasoning holds only if q does 
not approach 1 at the same time. For f3 — > oo the free energy and the energy goes to zero, but /?/ remains typically 
finite, 



-/?/(«) = iin(i- ?) + ^_ 



a / Dz InH 



k — Zy/q 
V^~q 



where (4.101) was used for the definition of H(x). The ground state entropy per synapse is now 

s = -Pf, 



(7.85) 



(7- 



the subject of the pioneering works In the units of the prior volum e f w (J) d N J = 1, where (3.5) gives the prior 

density, the volume of version space is e Ns . The stationarity condition ( 7.82j ) simplifies in the ground state to 



1-9 



a 



= — Dz 



exp (-jgg) 



(7.87) 



H 



and the AT eigenvalue becomes 



a 
2mj 



Dz 



q exp 



V 2(1 



dz 



H 



(7- 



Numerical evaluation shows that for increasing a the q goes to 1 and the entropy decreases towards — oo. For q < 1 
the dominant contribution in the above expressions comes from the region of exponentially small H, i. e., when its 
argument is large, positive. That is ensured by k > z. The asymptotics of H(x) for large x can be found in |230| 



_I 2 

e 2 K 
~H{x) 



2ir x. 



Thus the limit q — > 1 is realized, from Eq. (7.87), for a — a c {n) satisfying 



1 = a c {n) I Dz(k — zY 



that evaluates to the capacity curve 



Q!c(k) 



(k 2 + 1) (1 - H(k)) + 



-k 2 /2 



S2w 



(7.89) 



(7.90) 



(7.91) 



This gives a c (0) = 2, the known result of Refs. pi^p9] ]. The recovery of that by Gardner raised mu ch co nfidence 
in the statistical mechanical approach combined with the replica method. For the K-dependent capacity (7.90) several 
sources c ould be cited, see, e. <?., The a c (n) curve is shown on Fig. by tradition the horizontal axis is a. 

From ( 7.85| ) one can convince ones elf of the negative divergence of the entropy for a — > a c from below. Alternatively, 
the conclusion in the end of Section VII A 9 about s = — oo also applies at the limit of capacity. 
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K 




FIG. 9. The limit of capacity a c from Eq. ( 7.91 ), solid line, and the indicator a* of AT stability from ( 7.93 ), dashed line 



A glance at the AT eigenvalue (7.88) reveals that A is typically singular for q — > 1. Decisive is its sign, that can 
be determined in that limit by again using the asymptotics ( 7.89Q for a < a c (K). If the latter is inserted into ( 7.88) ) 
before derivation, one immediately sees, that the amplitude of singularity is 



where we have given a name to 



A(<?)(i-g) 2 



a* (k) 



a* (k) ' 



H{k) 



(7.92) 



(7.93) 



and depicted it on Fig. H It follows that if a c < a* (a c > a*) then the AT eigenvalue has a positive (negative) 
singularity on the capacity line. Since the RS solution presented here is only valid for q < 1, below the capacity line, 
we can conclude that for k > this region is AT-stablc. This RS state ceases to exists at the critical line not because 
of AT instability, rather the overlap q reaches the border q = 1 of its physical range. We mention but do not further 
elaborate on the property, that for neg ative k-s the RS solution destabilizes before the capacity is reached. Thus 
RSB is necessary below capacity, like in 214], but in the present case this occurs without the need for non-monotonic 
potentials. Note that a* is not the AT stability boundary for the RS solution, because it was calculated in the limit 
'/ • 1- 

In sum, for n > the region below capacity for T = can be described by the RS solution. However, beyond it the 
particular form of the potential V(y) affects the behavior, so we specify one for further studies. 



B. The special error measure 9(k — y) 



Now we apply the framework of Section VII A to the error measure (3.9) with 6 = 0, that is, to 

V(y)=6(K-y), 



(7.94) 



a much studied case. This potential does not weigh erroneous patterns by "how much" wrong they are as m easure d 
by the local stability parameter A, it sim ply counts them. We will often use th e function H(y) as d efined by ( 4.101 ). 
The initial condition for the PPDE ( [7 6^ ) is obtained by substitution of ( [7.94[ ) into Eqs. ( |7.69| , |7.70| ) 



/(<?(! 



,y) = -jgln 



1-e 



H 



k — y 

V 1 - 9(1) 



(7.95) 
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whence by derivation in terms of y the initial conditions for m(q, y), etc. result. 



1. The ground state 



For a > a c none of the finite i?-RSB solutions HQHIllI] are thermodynamically stable jTl| , thus it is necessary to 
resort to a CRSB ansatz. In the time par amet er t the initial conditions at t = 1 are as follows for T = 0. The initial 
condition for the field f(t, y) is given by (7.71). The minimum is realized at 



Hence, at t = 1. 



y if y < k - ^/Zrj 

k if k — \/2ri < y < k 
y if y > k, 



1 



(7.96) 







2 V 



if y < k - y/2q 

if k — y/2rj < y < n 

if y > k, 



(7.97) 



whence by differentiation in terms of y we get the initial conditions m(l,y) and x0->y) f° r t ne evolution in time t. 
Presum ing t hat P(t = 1, y) is known, we calculate the local stability distribution by applying the saddle point method 
to Eq. (|7.79|) yielding 



p(y) 




if y < k - y/2r] 
if k — y/2rj < y < k 
^dyP(l,y) if y > k. 



(7.98) 



Thus in p(y) a gap develops, but normalization is restored by the (5-peak at y = n. Similar feature was observed 
in various approximations, RS and 1-RSB, in previous works [^0], but there the function appearing in the place of 
P(l,y) was explicitly known. Note the singularities in y in the initial conditions - these make numerical calculations 
more difficult, but do not alter the fact that for averaged quantities the limit T — > is generically smooth. 

We do not elaborate more on the T = case, because numerical evaluation cannot be avoided anyhow. But, due to 
the scaling described in Section VII A 9 , the singularity of the T — > limit has been lifted and both T = and T > 



could be treated within the same numerical framework. 



2. The high temperature limit 



As reported in our Letter |L7[ and discussed for a general error measure V(y) in Section VII A 8, in the limit when 
both the temperature T and relative number of examples a are large, much can be said by analytic treatment about 
even the CRSB states. 

The general form ulae w ere presented in Section VII A 8 , where the effective free energy to be extremize d wa s given 
as <f>i[x(q)] in Eq. (fT.54b| ). The error measure under consideration determines the function W(q) via ( |7.48D . The 
simplest way to give W(q) is by 



W(q) 



exp 



(-£) 



W(0) = H{-k)\ 



whence W(q) and all its derivatives can be calculated. 



The RS ansatz means that x(q) = 9{q — go)- Using Eq. (7.54c) we get (the subscript of qo is omitted) 



+ ln(l-?)+ 7 (W(l)-W(?)) 



and the stationarity condition reads as 



7 exp 



(I-?) 2 



V 1+9 



(7.99a) 
(7.99b) 

(7.100) 

(7.101) 



75 



Local thermodynamical stability is determined from (7.57-7.59), thus the AT line is given by 



A 



* S {q) = X (q,q,q) = — 



1 



j_ 

2vr 



cxp 



K 2 (l-q) + q(l + q) 
l + qj (\ + qf/ 2 {\-qf/ 2 



0. 



The 1-RSB ansatz is equivalent to x{q) = (1 — xq) 6{q — q\) + xq 9{q — q$) and yields by Eq. (7.54b) 

1 I go 1 - Xq 



4>i(q ,qi,x Q ) 



2 | 1 - qi + x (qi - qo) 

— ln[l - q t + x (qi - qo)} 
xq 



;r 



ln(l- 5 i) 



+7 [W(l) - (1 - x )W( qi ) - X W(q )} 



(7.102) 



(7.103) 



The leading replicon eigenvalue is given by ( 7.102 ) with qi substituted, thus the boundary of local stability is 

\ 1 o RSB (qi) = \o(qi,qi,qi) = 0. (7.104) 



The classic Parisi phase, or SG-I, is characteri zed by the OPF (4.44). There x c (q) is the continuously increasing 
part of the OPF, for which we obtain from Eqs. (7.56, 7.99) 



x c (q) 



1 /¥ K A (q 2 -2g+l) + 2n 2 (-2q 3 
2 



7 



2q - 1) + 2q 4 + Aq 3 + 3q 2 + 2q + 1 



(q 2 - q k 2 + q + K 2 f /2 (1 - g) 1/4 (1 + q) 3/i 



g 2(l + q) , 



(7.105) 



The interesting feature is that the OPF has an explicit and non-perturbative form. The perturbation is in f3 now, 
and a small (3 apparently does not make x(q) degenerate. We shall need 



1-q if (? (1) < q < 1 

D(q) = { D c {q) = 1 - q (1) + f 9(1) dqx c (q) if g (0) < q < q {1) 



(7.106) 



D c(q(0)) 



if < q < q {0) . 



The leading term nontrivial in the free energy, (7.54b), depends only on the endpoints of the interval as 



0i (9(0), 9(i)) 



1(0) 



D c(q(o)) 



9(1) 



9(1) 



9(0) 



dq 
DM 



+ Ml -9(1)) 



+ 7 / dqx c (g)^((z) + 7^(1) - 7^(9(1)) 

'9(0) 



(7.107) 



The rep licon eigenvalues with identical arguments vanish due to the Ward-Takahashi identity, as described in Section 
VII A 4 , so the SG-I phase is at best marginally stable. Nonlinear stability analysis is not available, but believed not 
to result in instability. 

The fourth type of phase found here is a concatenation of a nontrivial plateau of x(q), like in 1-RSB, and a 
continuously increasing x c (q). This CRSB spin glass state is also called SG-IV. The x c (q) is again given by (7.105), 
but extra variational parameters w. r. t. the classic Parisi phase (SG-I) should b e int roduced: the valu e xq o f the 
plateau stretching from q^ to a q\, and its upper border q\. The OPF is given by (7.32) with x c (q) as in (7.105), and 



I D c (q) = l-q ll)+ S;^6qx c (q) 
UW ^ x 1 ( qi - q )+D c (q 1 ) 



if 9(1) < 9 < 1 
if 9i < 9 < 9(1) 
if 9(0) < 9 < 9i 

Dq = 1 - 9(1) + D c (qi) + x i (9i - 9(0)) if < 9 < 9(0)- 
The resulting free energy can be straightforwardly constructed from Eq. ( [7.54b] ) as 

x\ (qi 



(7.108) 



0i (9(0), 9i,9(i), xi ) = 



9(0) 
D 



— In [ I 

Xt 



9(0), 



D c {qi) 
+ 7^i {W{q 1 )-W{q i0) )) 

+ 7 dqx c (q)W(q) + jW(l) - 7 W(? (1 



«P) dq 
qi DM) 



ln(l - g (1) ) 



(7.109) 
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The specialty of the high-T limit is that the numerical evaluation of all spin-glass-like phases involves extremization 
only in a few scalars, because the x c (q) is explicitly known. This has been done in Ref. [fj"7f , the results are demonstrated 
in the figures there, which we redisplay for illustration. On Fig. [lOl the phase d i agram is sho wn, with one RS region 
and three different types of RSB. If more than one of the ansatze ( [7.100 
the averaged equilibrium state the one with the maximal free energy. 



7.103 



7.107| , |7.109[) worked, we considered 




FIG. 10. Phas e diag ram for the potential V(y) = 8(k — y) in the (7, k) plane for high T by numerical maximization of the 
free energy Eq. (7.54b) with the ansatze described in this section. The full lines separate phases with different types of global 
maxima. The RS, 1-RSB, SG-IV, and SG-I phases are indicated by a, b, c, and d, respectively. The AT curve is the RS 
phase boundary for n < K2 — 2.38 and to the right of the arrow it analytically continues in the dashed line, no longer a phase 
boundary. Reprinted from Ref. Jl7[ , 



It is a plausible conjecture that the RS phase is obtained by analytic continuation from the phase of perfect storage 
below capacity a < a c (n). So although at high temperatures there is no phase with zero error, the analog phase is 
the one with RS (labeled by a). Note that at high T we lost the intuitive picture, valid at T = 0, that increasing k 
takes us into the frustrated phase. We obtain three RSB phases. One is 1-RSB (b), the other the classic Parisi CRSB 
(SG-I, labeled by d), the third one is also CRSB, but with an extra plateau (SG-IV, labeled by c). The characteristic 
shapes for the OPF are shown in Fig. O, note the plateau in the SG-IV phase (c). 
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FIG. 11. The x(q) function at representative points as marked on Fig. [ui] by crosses. Reprinted from Ref. Jl7 



Extensive thermodynamical quantities are shown on Fig. [l^ for k = 0. The entropy is negative and decreases as it 
should for increasing a, i. e., increasing 7. The mean error per pattern in the high-T limit is \, and our approximation 
tells the correction e\ — T(| — e) from the formula ( 7.64b ). For 7 — > 00 we expect that even the correction E\ vanishes, 
this is indeed suggested by the picture. The transition from RS to CRSB is of third order from the viewpoint of the 
free energy, i. e., its third derivative exhibits a discontinuity. 
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FIG. 12. The entropy s, in leading order so from Eq. (7.65), the free energy term (f>\ from Eq. ( 7.54b| ), and the enlarged 
correction ej of the energy (7.64b) in the high T limit, for k = 0. The RS — SG-I transition is marked by an arrow. The 
dashed lines correspond to the thermodynamically unstable RS state beyond this transition point. The inset demonstrates the 
smoothness of the transition. Reprinted from Ref. [jnj . 



If at T — the region beyond capacity is a single CRSB phase (SG-I), for T — > 00 the decomposition into three 
different phases suggests singular surfaces born at finite T-s, whose precise locations we did not determine. For k < 2 
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the transition from RS to CRSB SG-I is of similar type as at T = 0, but for k > 2 we have a transition from RS into 
1-RSB that does not have a counterpart at T — 0. Nevertheless, the main picture, namely, that for low as (translates 
into low 7-s here), there is a normal phase analog to a paramagnet, and for large as the system exhibits complex, 
i. e., spin-glass-like behavior, is captured in the high-T limit. 



3. Numerical evaluation method for arbitrary temperatures* 



As demonstrated in Sect. VII A 2 , the ma xim ization of f[x(q)] with res pect to x(q) in ( 7.1§| ) is done with the side 
conditions that f(q,y) satisfies the PPDE (Q and P(q, y) the SPDE |j\|). We further recall that x(q) = for 
q < g(o) and x(q) = 1 for q > qn), and tha t in th e remaining non-trivial regime <7(o) < q < it is convenient to use 
for x(q) the parametrization by £(i) from ( |7.66b[ ). 

For an actual numerical implementation, £(i) has to be rewritten once more in the form of some ansatz with a finite 
number N of variational parameters vi,...,vn- 



£(*) = €(t;vi, ...,v N ) 



(7.110) 



For instance, v\ 1 ...,vat may be the coefficients of a polynomial ansatz for £(t). Another option would be a piecewise 
linear ansatz with v n = £(i = (n— 1)/ (N— 1)). Also a fi nite-st ep RSB ansatz with the steps and plateaus parametrized 
by the v n is possible. Given any such parametrization (7.110), we are left with maximizing the free energy functional 
with respect to the N + 2 variational parameters 



v 

VN+l 



(v ,vi, .. 

1(0) 

v = f3(l 



,V N+ l) 



9(1) 



(7.111a) 
(7.111b) 
(7.111c) 



where we have expressed qm through r\ according to ( 7.66c| ). This maximization of the free energy functional f[v] has 
to be performed under the (non-holonomous) constraints x(q^) > 0, x(qm) < 1, and x(q) > for gvo) < 1 < 9(1), 
or equivalently, £(0) > 0, £(l)//3<j(l) < 1 (cf. (|7.66b|) ), and £(i)q(t) - £(t)q(t) > for 1 < t < 1 (cf. (|j|)). It is 
convenient to incorporate these constraints into an augmented free energy functional fy.(y) in the form of soft penalty 
terms: 



/» = f[ V ] - fio^i-m) 

-Hi1>(£(l)/0q(l) - 1) 
ip(x) = x 2 9(x)/2 . 



IH I dtil>(t(t)q{t) - £(t)q(t)) 

10 



(7.112a) 
(7.112b) 



Thus, by successively increasing the coefficients /iq, /Xt, and fii in the course of the maximization procedure of f ll {v) 1 

the respective constraints will be respected more and more rigorously. 

Before we proceed, the following points are worth mentioning: (i) Like in Section VII A £ , our only assumption 
on q(t) is that it should be a monotonically increasing function with q(0) = q/Q) and q(l) — qm. But for concrete 



numerical calculations , especially at low temperatures T — /3~ 1 , the specific choice ( 7.66a ) has proven to be particularly 
appropriate. In any case, the implicit dependence of q(t) on the variational parameters vq — c^g) and Vn+i = f] should 
be kept in mind: 



q(t) = q(t;v ,v NA 



(7.113) 



(ii) In our experience, the maximiz ation procedure typically ends not at the border of the admitted parameter-regime, 
where the soft constraints (7.112a) come into action, but rather in the interior of this admitted region. Howev er, in 
the course of the maximization this border may be visited, and, in the absence of the soft constraints in ( [7.112a]) , the 
maximization procedure often goes out of the admitted region and diverges eventually, (iii) Strictly speaking, there 
are additional constraints on vo and vn+i associated with the restrictions < qro) < <Z(i) < 1; m our experience they, 
however, were never in danger to be violated with the obvious exception of cases with a stable RS solution, (iv) As 
in any variational ansatz, the necessary number N of parameters depends on how well the ansatz is adapted to the 
problem. In principle, a polynomial or piecewise linear ansatz (7.110) with a sufficiently large number N of parameters 
can approximate any shape of x(q) arbitrarily well. Whether or not N is sufficiently large in a given case should 
follow from the accuracy with which the stationarity conditions (7.21, 7.22) are satisfied. In practice, unavoidable 
numerical inaccuracies make things more complicated. As has been observed already in Ref. [16] within a 2-RSB 
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ansatz, in the neighborhood of its maximum the free energy functional f^(v) changes extremely little upon certain 
parameter- variations, that is, the energy lands cape f^(v) is very "flat" in certain directions. In our experience, with 
increasing number of parameters N in (7.110), this problem becomes worse and worse in that the finite numerical 
accuracy gives rise to a spurious "roughness" in the already very "flat" energy landscape. As a consequence, any 
maximization strategy becomes slow or even fails for too large N. Similarly, the stability conditions are satisfied 
very well (in comparison with their numerical uncertainty) within a fairly large neighborhood of the true maximizing 
x(q). As a consequence, in any specific case, a carefully tailored ansatz with not too many parameters has to be used 
and the criterion for convergence should be that 1(o), ?7, and £(i) change negligibly upon refining the parametrization 
PI . 



In order to maximize the augmented free energy functional (7.112a), a good compromise between robustness against 
the spurious numerical fine structure in the energy landscape and speed of convergence turned out to be a plain steepest 
descent procedure along the following lines: given a "working" parameter set v, the direction of the steepest increase 
of f^v) is along the gradi ent df^{v)/dv. Taking into account all the implicit dependencies on v in ( [7.110 ), ( 7.113| ) 
and the expression ( [7.20b| ) for the gradient of the origi nal free energy functional, a straightforward but somewhat 
tedious calculation yields for the gradient of f^(v) from ( 7.112a ) the result 



dv 



1 F(t)Z(t) dq{t) ^ , M 1 dq(l) 



dt ■ 

2q{t) dq {0 ) ' <?(1) dq (a) 
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di + /io^'(-£(0)) 
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F(l) 



dv 



N+l 



dv n dv n 

1 F(t)gt) Bq(t) 
20q(t) dq (1) 

M(t) U(t)ML-m 
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(7.114a) 



(7.114b) 



dq(t) 



[3dq (1) 



At 



where 1 < n < N, ^'{x) — x9(x), we have introduced the quantities 



(7.114c) 



(7.115a) 
(7.115b) 



and used F(t) to denote the 1. h. s. of Eq. (7.77) for a given function 



Along the direction df tl (v)/dv of steepest increase, one now searches for the maximum, i.e., the expression f^(v 



\dffj,(y)/dv) has to be maximized with respect to A. This implies the condition 

^(Amax) = 



(7.116) 



for the maximizing A = A max , where 



By updating the parameter set as 



J(A) 



du(v + \du(v)/dv) dUv) 



dv 



dv 



+ Amaxd/^tO/df 



(7.117) 



(7.118) 



one completes one iteration step of the steepest descent procedure. This iteration scheme is then repeated until v 
does not a pprecia bly change any more. Note that due to the numerical inaccuracies it makes little sense to locate the 
zero from ( 7.116| ) very precisely in each iteration step. Our usual strategy was based on the assumption that J(A) 
behaves approximately linear near its zero at A — A max . If J(A) is given at two nearby A- values, one then obtains an 
approximation for A max by linear interpolation. One such readily available J(A)-value is that for A = 0, the second 
one follows by choosing for A the approximation for A max from the previous iteration step. 
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4- The CRSB state 



In Ref. Jlq l we prese nted some characteristic results, obtained by the method expounded in the previous section, 
for the error measure ( 7.94 ). In a non-exhaustive search we found that if the RS solution is AT- unstable, at T = 
beyo nd capacity and also for some low temperatures, only a classic Parisi CRSB state emerges. Its OPF is given in 
4.44 ), and also denoted as SG-I. We conjecture that at T — the region beyond capacity is such a phase. Sufficiently 
high T- s, where the 1-RSB and the CRSB state with a plateau (SG-IV) would have arisen, as described in Section 
VII B 2 , were not reached in our explo rations . 

The scaling introduced in Section VII A E , and notably the introduction of the OPF = (3x(q) 1 allows the 
description of the CRSB state at any temperatures, at the same time maintaining a smooth transition to the ground 
state, T — 0. Physically, the fact that x(q) — > 0, at T = 0, for any q < 1 means that q = 1 with probability one. 
Thus freezing sets in, similarly to the ground state of the SK model [^|. At the same time, the degenerate x(q) is no 
longer a useful OPF, because the free energy becomes a functional of rather £,(q). 

On Fig. O the scaled OPF = (3 x(q) is displayed for various parameters. 
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FIG. 13. Scaled order parameter function x(q) for k = 0, a — 3 &t T — (solid), T = 0.01 (dashed), and T = 0.1 (dotted). 
The first discontinuity is at qio), below the function is constantly zero. The second discontinuity for T > is qm, which goes 
to 1 for T — > 0. Reprinted from Ref. pi 



All parameter settings are in the AT-unstable region. This figure is the first indication, to our knowledge, of Parisi's 
CRSB state for low temperatures in a system that is not a model of long range interaction spin glasses, or closely 
related to such as the Little-Hopfield network. It is remarkable that the scaling by (3 makes the continuously increasing 
segment £ c (q) = f3x c (q) of the OPF little sensitive to the temperature. Equally stable is the lower end qim of the 
£, c (q) segment, but the upper end gm shows linear temperature dependence, 1 — qm oc T. The rightmost plateau's 
value is obviously £(1) = /3. 



At the same parameter settings as before the local stability density is displayed on Fig. 14 
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FIG. 14. Density of local stabilities p(A) from theory for k = 0, a = 3 at T = (solid), T = 0.01 (dashed), and T = 0.1 
(dotted). Reprinted from Ref. JXsJ 



Since in the method of Section VII B 3 the evaluation of the probabilit y fiel d P(q,y) by the scaled SPDE ( 7.73| ) is 
done in every approximant step, we obtain the sought field in the end by ( 7.79Q . Not shown is the Dirac delta peak at 
T = 0, this restores normalization to one there. A gap exists at T = 0, with right border A = k, in accordance with 
( [7.98 ), but the gap immediately disappears for any positive T, as it can be seem from (7.79). At T = the density 
p(A) linearly vanishes at the lower edge of the gap. 

Comparison between the CRSB solution and earlier RS g, 1-RSB |7|,||, and 2-RSB (jll| approaches shows that 
averaged quantities, like the mean error per pattern do not show significant differences. The qualitative behavior of 
the error, that it is zero below and is positive beyond capacity at T = 0, furthermore that it linearly increases for 
small a — a c , is reflected by the previous solutions. The 1- and 2-RSB e(a) curves look the same on a resolution of a 
figure EL On the other hand, the difference is more conspicuous in the distribution of non-self-averaging quantities. 
The OPF x(q) is the averag ed p robability measure of the overlap of coupling vectors, and the definitely continuously 
increasing part of it in Figs. [l3| p"l| shows that finite i?-RSB-s are qualitatively in error. Further qualitative difference 
can be found in the distribution of local stabilities p(A). Indeed, for finite i?-RSB the p(A) exhibits a discontinuity 
at the lower edge of the gap. The right tendency is shown by the feature that the size of the discontinuity is smaller 
in the 1-RSB than in the RS solution H. 



5. Simulation 



In this section we describe the simulation results from Jig)] . W endemuth adapted existing algorithms for below 
capacity of the simple perceptron, with potentials of the form (3.9), to the region beyond it by specially dealing with 
patterns with positive stabilities 154 1, and performed a series of simulations fl37| . The most sensitive part of his work 
was on the potential with b = 0, which counts the number of unstable patterns, an NP-complete problem from the 
algorithmic viewpoint [ 153 1 . His data showed significant deviation from the then available best theoretical prediction 
from the 1-RSB calculation of Majer, Engel, and Zippelius [Q. He evaluated the probability density of local stabilities 
at a = 1 and k = 1, a point known to be beyond capacity. Although the shapes roughly resembled, a gap, and a 
peak at its right end, were present, the simulation data gave systematically and discouragingly larger stabilities than 
predicted by theory. 

Essentially following Wendemuth's algorithm we redid the simulation in order to see how persistent the deviation is. 
The first step is to generate random patterns (|3.2|). We selected numbers with uniform distribution from an interval 
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centered around zero and in the end normalized them as 



N 



N 



(7.119) 



The output for the patterns, were taken uniformly 1, not restricting generality, for Su have random signs. The 
algorithm goes in discrete time t = 0, 1, We initialized at t = the coupling vector according to the Hebb rule 



M 



Jjt(O) = const. ^S£, 



(7.120) 



with the constant chosen so that the Eucledian norm was |J(0)| = N. At time t the local stabilities 



A"(t) 



J(t) ■ 



(7.121) 



are computed and among the unstable ones, i. e., A M (t) < k, the one with the largest A M (i) is selected. This is 
the least uns table p attern, characterized by the index no(t). The couplings are updated according to the rule of 
Wendemuth p5^7]. We took 



J(t + 1) = J{t) + A (S w(t) + AS(t) 



where 



AS(t) = 



I if A«>(*)(f) > 

m N/\J(t)\-A"QV(t) i{A Mt)( t ) < 



(7.122) 



(7.123) 



The A is the gain parameter, chosen in Ref. 154 ] as A = N~ 3 / 2 . By trial and error we found that a larger gain parameter 
A = iV _1 did not endanger overall converg ence, and made the final approach for a given pattern, A^W^) — > k, 
faster. The second row in the update rule (7.123) is Wendemuth's term introduced to specially cope with patterns 
with negative stability. 

At the next time step t + 1 we again find the least unstable pattern with index /xo(i+ 1) and update the couplings by 
the above rule. The usual course of the algorithm is that the least unstable pattern is the same, Ho(0) = Mo(l) = • • • , 
until it becomes stabilized at say t\ — 1, whence another pattern is taken for some steps, /io(^i) = Mo(^i + 1) = • • • > 
again until it becomes stabilized. In principle, another pattern may become least stable before the one in question is 
stabilized, but typically this was not the case. 

The above recipe is repeated until a pattern cannot be stabilized in a reasonable time. The notion of reasonable 
time could be quantified, because the time needed to stabilize a pattern showed a systematic increase as function of 
the total number of patterns stabilized before. Therefore, it is a good recipe to halt the algorithm, when a pattern 
cannot be stabilized within a small multiple of the extrapolated convergence time. In test runs, if the last pattern 
could not be stabilized within twice the extrapolated convergence time, it could not within ten times of the same 
either. Thus we are confident that we exploited the possibilities of the update rule described above. 

Wendemuth algorithm is based on the argument that one has the highest chance to stabilize the pattern among all 
patterns with A p < k whose A M is closest to k. So this algorithm may maximize the number of stable patterns, by 
successively pushing the stability of the least unstable pattern to k from below. A consequence is that the remaining 
non-stabilized patterns with A M < k will have relatively large distance k — A' 1 , but the latter quantity does not enter 
the present error measure. Nevertheless, the principle of stabilizing the least unstable pattern resembles qualitatively 
the gradient descent algorithm for differentiable error measures, because every step is made in the momentarily most 
promising direction. The shortcomings of such algorithms in NP-complete problems is known, and we cannot be 
certain that the number of unstable patterns is indeed minimized. 

The result of the simulation at the parameter setting a = n = 1 is shown on Fig. [l5|. Since n > 0, in the final 
approach A Mo '*' — > k for the momentarily least unstable pattern the stability is positive, so the second row in the 



update rule (7.123) does not come into play. 
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FIG. 15. Density of local stabilities p(A) at a = k — 1. The horizontal axis is A, the vertical one p. The theoretical 
prediction is given by the full line. The two empirical densities are normalized histograms, taken with M — N = 500 and 1000. 
Reprinted from Ref. pi. 



The full line is t he result of numerical extrcmization of the variational free energy ( 7.19 ) by the method explained 
in Section VII B 3. We omitted the Dirac delta peak of the theoretical probability density at n = 1. The dashed 
lines are the histograms for the local stabilities from simulation for two sizes, M = N = 500 and 1000, with proper 
normalization. We do not enclose the original data of Wendemuth J37j, but mention that his histogram showed a 
much larger systematic error. To quantify the deviation let us consider the mean error s, i. e., the relative number of 
misclassified patterns. Wendemuth's number is 0.21, the present simulation gives 0.15, while theory predicts 0.1358. 
Thus we are still about 10% off the theoretical value, but it is a remarkable improvement w. r. t. the previous deviation 
of 55%. The size of the gap from simulation is also within about 10% of the theoretical value. The simulation data 
reproduces, for the larger size M = N = 1000, the property that the density p(A) linearly vanishes at the lower edge 
of the gap. This should be contrasted with the 1-RSB result in Ref. [Q, where the size of the discontinuity at the 
lower edge of the gap is about the third of the height of left peak. The simulation clearly favors the CRSB solution. 

In summary, the theoretical and simulation data do not match perfectly, however, given the NP-completeness of 
the numerical problem, this does not disprove theory. We mention that the algorithm used had the primitive side of 
being deterministic, furthermore, it does not have a rigorous mathematical basis for convergence to the desired state. 
There is obviously room for further improvements. 



VIII. THE NEURON: INDEPENDENTLY DISTRIBUTED SYNAPSES 



A. Free energy and stationarity condition 



In this pa per we focus mostly on the spherical neuron. Since, however, the main formulas for the case of prior 
distribution ( |3.6| ), where synapses are independent and obey arbitrary distribution, follow straightforwardly from 
Section IV, we now briefly review them. In the course of continuation the limits 
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are assumed. The corresponding free energy (3.22) can be characterized by two OPF-s 

?(!) = 1(i) > 



q{x) , q(0) = g (0 ) 
q{x) , q(0) = g (0 ) 



g(!) = 9(1) ■ 



.la) 
.lb) 



;.2a) 
.2b) 
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Alternatively, we can take as OPF-s the respective inverses 

x(q) , x(q) 

Then 

q = q(x(q)), 

or its inverse function 

q = q(x(q)) 



(8.3) 
(8.4) 
(8.5) 



establishes a relation between th e overla ps q and q. 

Concerning t he /e term, Eqs . ( |7.1| - [7.9[ ) from the spherical case carry over unchanged. The entropic term fl3.22d ) is 
a transcript of (]4 . 4 1 ) with (f4.3| ) together with the appropriate equations that produce the averages. We introduce the 
field (see Eqs. (4.40a||4.40b|) 



to get 



f s [x(q)} = lim l-/ b (Q) = -/3" 1 <P 

n— >0 n 



In / du wq(u) e- 0uy ,Q 



= /(0,0), 



where f(q, y) is the solution of 



dj=-\dlf+\pX{dyf) , 



/(?(!),») = -r X ln Dz du w (u) e -0<y+i^)) 



Introducing 



m(q,y) = d v f(q,y), 



we have 

dqifi = —^dyih + /3xmd y rh, 
m(q(i),y) = dyf{q {1) ,y). 
Furthermore, the "-ed 'susceptibility field' is 

X(q,y) = d v m(q,y), 

obeying 

dgx = -k d yX + (ix{rhd v x + X 2 ), 
x(Q(i),y) = dyf(q { i),y). 

The probability density P(q, y) satisfies a variant of the SPDE 

dgP= \d 2 y P + pxdy(Pm), 
P(0,y) = 6(y). 



The interaction term ( 3.22c ) is simplest if expressed through the functions (S.2) 

fi[x(q),x(q)] = ~f3 / dx q(x) q(x). 
Jo 

Since a function is a functional of its inverse, the fi[- ■ ■] can be considered as functional of x{q) and x(q). 



(8.6) 
(8.7) 

(8.8a) 
(8.8b) 

(8.9) 

(8.10a) 
(8.10b) 

(8.11) 

(8.12a) 
(8.12b) 

(8.13a) 
(8.13b) 

(8.14) 
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The stationarity conditions (3.24 



3.25) now read as 



q = I dy P(q,y) rh(q,y) 2 , 
q = a I dy P(q,y) m{q,y) 2 , 



.15a) 
.15b) 



where the connection between q and q is established by ( p.4[ ) or ( |S.5| ). The r. h. sides are respective functionals of x(q) 
and x(q). Note that solving these equations involves also finding the starting point cj(i), in contrast to the evaluation 
of the energy term, where the initial condition is fixed at q = 1. Given the solution for the stationary x(q) and x(q), 
by substituting them into the r. h. s. of 



/ = /.[*(«)] + fMq),x(q)} + af c [x(q)] 



(8.16) 



we obtain the final result for the mean free energy. 

A special case of independently distributed synapses is the clipped neuron, i. e., the neuron with discrete synapses. 
The most studied such model is the Ising neuron with binary synapses, which h as a ttracted considerable interest (see 
pljj and Jl2| for references). The prior distribution in the Ising case involves (3.7), so the initial conditions for the 
PDE-s are 



f(q(i),y) = -P 1 lncosh/3y + ±/% 
™>((l(i),y) = -tanh/3y. 



.17a) 
.17b) 



The Ising neuro ns st udied in the literature so far were reminiscent to the random energy model in that they involved 
at most 1-RSB |231|. However, only a few choices of the error measure potential V(y) were considered, and at this 
stage it cannot be excluded that the full Parisi scheme becomes of import for other potentials. 

Finally we emphasize th at p reviously studied SK-type spin glasses, with Ising, or, as a matter of fact, any ki nd o f 
individual spin constraint (3.6), are included in the considerations of this section. Formulae equivalent with (8.17) 
are well known from the SK model. This is not surprising, because the entropic term with the Ising constraint is the 
same for both the SK spin glass and the p resen t neuron model. In the case of other constraints for the SK-type spin 
glass the first two terms on the r. h. s. of (8.16) remain valid. Concerning the third, the energy term, let us assume 



the most general multi-spin interaction of N icuwenhuizen's |68f] resulting in a term (3. 48). T hen indee d, if in (8.16) 



the af c is replaced by the energy term (6.51), with the understanding of the correspondence (6.49,3.50), one obtains 
by ( |S.16 ) the full free energy functional of the spin glass problem. 



B. Variational principle 



The results of Section VIII A can also be derived from a variational principle. A reasoning similar to what we 



followed in the spherical problem yields a free energy functional /[...] that produces the mean free energy as 



/ = max extr extr extr / 
x(q) f(q,y),P(q,y) £ (<?) f(Q,v),P($,v) 



x(q),f(q, y),P(q, y), x(q),f(q, y), P(q, y) 



(8.18) 



The order of the extremum conditions is not binding, but, given the physical meaning of the OPF x(q), the maximum 
is to be taken last. The free energy functional is 



/[..J = / 8 [...] + /W[...] + /W[...] + /i [...]+ a (/ e [...] + /W[...] + /W[...]), 
/.[..■ 

/«[••■ 



/(o,o), 

9(1) 

dq I dy P(q, y) 



hf% v) + k 9 vM y) - ¥m) (d v f(q, y) 



■I a 



(2)[ 



dy P{q(i),y) 



In / du w (u) e-^ + f(q (1) ,y) 



(8.19a) 
(8.19b) 

(8.19c) 
(8.19d) 



where / e [. . .], fi 1] [. . .], /i 2) [. . .], and /*[. ..] are given by Eqs. ( [r.24cQ , ( [7.24d| ), ( |7.24e| ), and ( fig ), respectively. The 



remarkable symmetry of the above expressions in the quantities with and without the * mark is the consequence of 
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the fact that both the entropic and the energy terms are essentially of the general Parisi form, the main difference 
being in the starting point of the time variable and the initial condition for the PDE. 

The free energy functional /[■■•] should be maximized in x(q) and extr emized over the other function arguments. 
Besides the extremization of terms analogous to those appearing in Section VII A 5 , we have to calculate the functional 
derivatives of fi[x{q), x{q)]. If is the inverse of some function u(t) then variation of the identity = t 

yields the functional derivative of the inverse function as 



5u- l (u{tx)) _ S(t 2 -h) 



Su{t 2 ) 



This relation helps us to calculate the sought derivatives of the interaction term 



6 fi[x(q),x(q)] 

6x(q) 
Sfj[x(q),x(q)] 
Sx(q) 



--(3q(x(q)) 



(8.20) 

3.21a) 
5.21b) 



These, together with functio nal derivatives of the type determined in Section VII A3, lead to the stationarity relations 
displayed in Section VIII A for intervals of strictly increasing OPF-s, including the points where th e OPF-s exhibit a 
step. Plateaus should be dealt with in a manner similar to what was described in Section VII A3. Extremization in 
terms of the starting point in time of the ~-ed PDE, qm, yields 

(8.22) 



£(?(!)) «(i) = / dy P{q(i),y)x{q(i),y), 



a condition which was not displayed in Section VIII A 



C. On thermodynamical stability 



In the case of independently distributed synapses the free energy (3.22) involves combined maximization and 
extremization. Clearly, there are no stability requirements following from the 'extr' condition. On a simple example 
we now give the recipe for stability calculations in the replicon sector for such a case. 

Consider the two-variable function 



F(x, x) = f(x) + f(x) + xx, 
where / and / are real functions. We are seeking 



.23) 



min extr F(x, x). 



X X 



Extremum conditions for x and x imposed simultaneously would read as 

x = -f(x), 
x = -f{x). 



Substitution of the stationary value ( p.25b| ) gives 

F(x) = F(x, -/'(*)) = f(x) + f(-f(xj) - xf'(x). 
The stationarity condition in terms of x is 

F'{x) = -f"{x) [z + /'(-/'(*))] =0. 
The stationary point x + f'(—f'(x)) = is a minimum of F(x), if 

F"{x)\ x= _ H _ fl{x)) = -/"(*) [l - f"(x)f" (x)] > 0, 



(8.24) 



(8.25a) 
(8.25b) 



(8.26) 



.27) 



(8.28) 
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where (8. 25a. 8. 25b) are understood. If we have more-than-one-dimensional objects en lieu of x and x, and at the 
saddle the Hessian matrices of / and / can be simultaneously diagonalizcd, then a similar formula holds, wherein the 
appropriate eigenvalues of the Hessian at stationarity should be substituted for f"(x) and f"(x). 

The above principle can be applied to the case of independently distributed synapses. There are two families 
of replicon eigenvalues, \ c (qi, 92, 93) and A s (9i, 92, 93) coming from the energy and entropic ter m, res pectively. The 
contribution from the energy term, X c (qi, 92, 93), is the same as in the spherical case, given by Eq. ( 7.29| ). Analogously, 
from the ~-ed entropic term we have 



A s (<7i, (72,93) = -ft 2 / dy 2 d?/3 f W¥ ,(gi; 0,0; 92,2/2; 93, 2/3) X (92, 2/2) X (93,2/3) 



(8.29) 



Here we spell out the obvious, namely, the ~-ed PPDE gives rise to a ~-ed Green function, see Section IVB4 , whence 
the vertex function r„ 



can be defined. Finally, based on (8.2S ) the necessary criterion of stability becomes 



A c (9i,92,93) 1 -aA e (gi, g 2 , Q3)A s (qi, 92,93) >0 



.30) 



Here we have omitted a prcfactor a and allowed equality for the sake of possible Goldstone modes in a Parisi phase. 
The stability condition is of course understood at stationarity, which yields a concrete 9 = 9(9) function. That implies 
qi = q(qi), so the overall replicon eigenvalue is parametrized by three independent variables, as in the spherical case. 



IX. CONCLUSIONS AND OUTLOOK 



The main messages of this paper were extensively discussed and conclusions were advanced in Chs. || and [n| so we 
only highlight a few moral issues here. 

A sensitive question in approximating a CRSB state by finite -R-RSB is how good it will turn out to be in the end. 
However, there is so far no reliable a priori estimate of this error, as opposed to say a series expansion, where the 
last power retained gives at least asymptotically a bound for the error. Sometimes there is a qualitative indicator 
showing that a low order approximation is wrong. Long known example is the ground state entropy of the SK 
model, which was negative for finite i?-RSB ansatze, a problem cured only by Parisi's CRSB solution. However, 
often macroscopic quantities arc quite well approximated with the RS or low i?-RSB solution. The main advantage 
of the CRSB calculation w. r. t. the approximations is that the latter may not be able to even qualitatively correctly 
predict distributions of local, non-self-averaging quantities, like the overlaps and local fields. These are observables 
in numerical simulations and can help to decide between candidate theories (see, e. g., the numerical review ]9lf| on 
spin glasses). 

On the technical side, the mathematical framework discussed in Chs. and |v| relates to the general properties 

of CRSB phases, irrespective of the storage problem of the neuron, upon which its use was demonstrated subsequently. 
It allows for a non-perturbative description in a wide range of problems in disordered systems, like long range 
interaction spin glasses other than the SK model, and may be a starting point for the study of frustrated phases, 
i. e., unsatisfiable situations, in optimization problems in general. 

Among the notoriously difficult problems in artificial neural networks is the problem of learning and generalization 
of unlearnable tasks [ 199| , 232 1 . In the traditional scenario of equilibrium, i. e., batch, learning from examples, an 
unlearnable problem is characterized by the fact that there is a limited number of examples the network can reproduce. 
Beyond this limit of error-free learning, the generalization ability might be further improved, but the minimal training 
error is positive. This is in close analogy with the region beyond capacity in the storage problem, so it is a sensible 
assumption that theoretical methods able to deal with imperfect storage may also b e of use for the description of 
learning the unlearnable. Further possible area of application is unsupervised learning |232|, where no desired output 
is given, rather the properties of the distribution of examples is to be extracted. Again, if the network can be saturated 
by the examples a complex phase appears, where methods similar to those presented in this paper may be the key to 
the solution. 
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APPENDIX A: ABBREVIATIONS 

This Appendix lists acronyms and abbreviations used throughout the paper. 

left-hand-side 1. h. s. 

right-hand-side r. h. s. 

with respect to w. r. t. 

de Almeida- Thouless [stability] AT 

Green function GF 

Order parameter function OPF 

Partial differential equation PDE 



Parisi's PDE, Eq. (139) PPDE 



Replica symmetry breaking RSB 

i?-step RSB i?-RSB 

Continuous RSB CRSB 

Shcrrington-Kirkpatrick [ model] SK 



k 



Sompolinsky's PDE, Eqs. (4.53,kL54) SPDE 



Ward-Takahashi identity WT1 



APPENDIX B: DERIVATION OF THE REPLICA FREE ENERGY 



The n-th moment of the partition function (3.10), with (3.3) inserted as constraint, reads as 



(Bl) 



The indices k, /i, and a run from 1 to N, M, and n, respectively. The Fourier transformation of the Dirac deltas 
introduces the ancillary variables adjoint to y%. Average over the Gaussian distribution of patterns St, and over 
the outputs (J*, which are ±1 equally likely, can be performed straightforwardly. In fact, since St is scaled by the 
vanishing factor iV -1 / 2 , the same result would be obtained for other distributions of with zero mean and unit 

variance independent of fj, and k. 



(z n ) 



;i<i v -/„ w(j a ) ii 



2tt 



J] exp I -(3 V(yZ) + £ ix£tf - ^ E « E J ^ Jbk ) 

fj. \ a a a,b k I 



(B2) 



If we substitute the o verlap s defined in ( 3.18 ), the product over /i gives the M = aN-th power of e - n P^ Q \ where 
/ e (Q) is displayed in (3.17d). Our inserting the constraint (3.18) yields 



(Z n ) 



l[d N J a w(J a ) 



Y[ Ndq ab 6 I Nq ab - J akJbk I 
a<b \ k / 



(B3) 



For both the spherical (3.5) and the independent (see condition below ( p.q )) prior distributions we have q aa = qa = 1. 
Fourier transformation of the Dirac deltas introduces the variables q a b adjoint to q a b, and we have 



(Z n ) 



J\ j-dq ab dq ai 



l[d N J a w(J a 

a 

exp iN^2q ab q ab - i ^ ^ q ab J ak J bk 

\ a<b k a<b / 



(B4) 



We shall see that for the prior densities of interest, after integration by the synaptic coefficients, the exponential 
has the overall coefficient N. Then for N — ► oo the saddle point method can be applied, and it will turn out that the 
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stationary value of each tildeq a b is imaginary. We presume that the prior density is such that the integration path of 
q a b can be distorted so as to go through the imaginary saddle point. The path can then be taken a straight line parallel 
or perpendicular to the imaginary axis in a sufficiently large neighborhood of the saddle point, depending on which 
orientation ensures a maximum at the saddle. This procedure is typical if one integrates a fast oscillating integrand 
— then only an extremum, not specifically minimum, condition should be satisfied. If we succeed in determining the 
saddle values of the q a b-s as function of t he q a b-s, we have to minimize in terms of q a b- We shall see that this can 
be carried out for the spherical constraint ( |3.5| ), but we cannot explicitly determine the q a b-s for general independent 
synapses (jTa), 



In the case of the spherical constraint (3.5) we shall make use of the advance knowledge that the stationary values 
of q a b = iq_ab are real. Let us insert the Fourier transform of the Dirac deltas representing the spherical constraints, 
thereby introducing the integration variables q a , then switch over to q a = iq a to obtain 



(Z n ) =C N N^ 21 (2n)- !li2 ^ 11 



dq a b dq a b 



.a<b 



l[d N J a dq a 



-Na0f e (Q) 



exp N ^ labqab + N '^2<la~'^2 q a J ak - X! X! lo-bJakhk 



(B5) 



a<b 



k a<b 



We can introduce diagonal elements for the matrix Q as q aa — 2q a . Performing the Gaussian integrals over J a ^ we 
obtain 



(Z n ) =C n N^- L (2tt) a 



Y[dq a 

a<b 



]Jdq a 



exp TV ^-a/3/ e (Q) + ^TrQQ - ilndctQ^ 



(B6) 



Given the asymptotics of the prefactor 



InC 



N 



- — In 27re, 
2 



(B7) 



in the large N limit we have by the saddle point method the free energy 



f = H "^flC 1 ~ ^ = Hm n " ^ ^ / ^ Q ' Q)' 
nivp n—>0 n q q 

- 77 1 - 1 

/(Q, Q) = ^ + «/e(Q) - ^TrQQ + ^IndetQ. 



(B8a) 
(B8b) 



By ou r using (3.2C) the extremum con ditio n for the matrix elements q a b results in Q _1 = Q. Substitution thereof into 
(|B8bf) gives the spherical free energy ( ^.17 ). 

Similar derivation yields the free energy ( |3.22 ) for the prior distribution (3.6) of independent synapses. There we 
use q a = —i(3 2 q a and obtain 



(Z r< 



\ [ dq ab dq a b 



.a<b 



,-N a 0f e (Q) 



exp [-Nf3 2 Y, q a bq a b 



a<b 



Y\_ dJ a Wo(Ja) exp (3 2 ^ QabJaJb } 
a J \ a <b ! ) 



N 



(B9) 



The ancillary matrix Q has vanishing diagonal elements, = 0, with that in mind we recover the expression ( 3. 22] ) 
for the free energy. The ancillary matrix elements cannot, in general, be eliminated as easily as in the spherical case. 
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APPENDIX C: DERIVATION OF THE i?-RSB FREE ENERGY TERM 



This Appendix contains the few steps that lead to the i?-RSB free energy term, starting out from Eq. ( 4.11 ). The 
integrals therein taken over the variables x a -s yield Dirac-deltas, which fix the values of the y a -s. The j r indices can be 
understood as follows. Assume as usual that each m r is a divisor of n. The ordered sequence of integers 1, . . . , n are 
divided into n/m r "boxes" each containing m r integers. Then the index j r enumerates those boxes. Given 1 < a < n, 
for each r, the labels the box that contains a, that is, 



ir(a) 



n 1 



1. 



(CI) 



where [. . .] denotes the integer part. Then the coefficient of an x a in the first term of the exponent in ( 4.11 ) is 
characterized by the j r (a)-s. That way we arrive at 



R+l n/rn T 

n n < } 



'R+l 



(C2) 



r=0 j r = l 

Note that wir+i = 1 and a = jn+±(a); we will substitute Jr+i for a. The integrals over z j^ + ^) a ) factorize as 

n/rriR+i 



a n<p[&(y),q,Tn] _ 



R n/m r 

n n °i r) 

r=0 j r = l 



x, exp 



n / d 4T 



(C3) 



The functions j r {jR+i)> r < R, are step-like in that they are constant for mn/mn+i different Jr+i-s belonging to 
the same box of length vtlr. Integrations over Zj^ + ^ -s associated with the same box give identical results. Different 
integrals are characterized by different jr-s, this can be given as the new argument for the rest of the indices as j r {jR), 
r < R. We then have 



ntp[<P(y),q,m] _ 



R n/ m r 

nn° 

r=0 j r =l 



n 

JR=1 



y DZfl+i CXp Z >0h) Vlr - 9r-l + ^fl+l\/«R+l = 



(C4) 



Again, integration over a gives the same value for those jn-s that define the same jV(ifl)j r < i? — 1. These can 
be characterized by Jr-x, and one obtains 



R-l n/rn r 


n/m H _i 






n n D 4 r) 


n 






r=0 j r =l 


JR-1 = 1 







'R-l R+l 
Jr) 



exp<2> Z j r (3B-i)V^ ~1r-X+ Z rV^r- Qr-1 



\r=0 r=R / . 

The expression can be rolled up by continuing the above reasoning and we arrive at 



_ m n-i 



(C5) 



nv>[*(y),9,m] 



Dz 



Dzi 



Dz 2 



/ Dzr+i exp^ z rV /q r - q r -i 

•* \r=0 



(C6) 
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APPENDIX D: DERIVATION OF THE PPDE BY CONTINUATION 



To the author's knowledge Ref. J224| is considered to be the only publication on th e de rivation of the PPDE. 
However, we were not able to reproduce the derivation from that article, furthermore, [224| required R — > oo and 
q r — q r -i — * , con ditions which we did not find necessary to prescribe. 

In essence, [224] proposes an iteration in a direction that is opposite to that of the recursion ( 4.15| ). We were unable 
to rec onstr uct that, mostly because the sta rting term was not known. In other words, we evaluated the free energy 
term (4.11) starting from r = R+ 1, while [224] did so from r — (in our notation). 

When q r — q r -i — > is assumed, our recursion yields the PPDE in the spirit of Ref. 224 1. We use the identity 



(Dl) 



to rewrite (4.15a) into 



,!(Sr 



(D2) 



In order to produce a PDE from the recursion, the assumption of ordering for q r -s is necessary. We can then relegate 
the dependence on the index r to dependence on the variable q = q r . Continuation is then performed by replacing 
q r b y qr, ib r (y) by ip(q,y)- We allow for nontrivial limits g( ) an d 1{x) as introduced in (4.42). The conditions ( |4.6| ) 
and (4.13) ensure monotonicity of x(q). If we assume a smooth x(q), i. e., that all q r — q r -i — > and x r — x T -\ — > 
for 1 < r < R + 1, then an expansion of (D2) in the differences to lowest nontrivial order yields for ip(q, y) the PDE 



(4.34) in the interval (g(o), g( 



As we found in Section IV A 2 



Eg . (4.34) and, equivalently, the PPDE (|4.3q), stands even if x(q) is not smooth, 



with the right interpretation of (4.34) at discontinuities of x(q). On the other hand, the author gladly acknowledges 
that the way he fir st ob tained the PPDE for the general free energy term ( J4.l| ) was in the spirit of the above discussed 
derivation of Ref. [224]. 



APPENDIX E: MULTIDIMENSIONAL GENERALIZATION OF THE PPDE 

We consider here the generalized free energy term 

x ex p ( j EL El =1 - 1 5X =1 E[, 6=1 «S«S4) . ( E1 ) 

where the order parameter matrix has now extra indices 

[QC - q k a l (E2) 

Such a situation occurs, for instance, in the treatment of thermodynamical states in vector spin glasses, or, of the 
metastable states in the SK model. When counting the stationary states of the Thouless- Anderson-Palmer eguations, 



Bray and Moore fed] encountered Eg. (El) with K = 2 and a special <P. They displayed the corresponding PPDE but 



did not pursue the matter further. Since Eg. (El) is a straightforward generalization of the Parisi term, we briefly 
give the way how to evaluate it. Also, we concisely formulate the calculation of replica correlators. 

The assumption of the Parisi structure for all individual submatrices of Q with fixed k, I can be cast into the form 

R+i 

Q=^(Q r -Q r _i)U mr (g)l n/mr . (E3) 

Here 

[Qr(a, b )] kl = Sr(«,6) = Qab (E4) 
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is the symmetric K x K matrix analog of ( [5.14 ) . The quadratic form in the exponent in ( El ) is now 



_R.+ 1 K n/m r j r m r 

E£(^-eo£ E < 

r=0 k,l=l j r = l o=mf(3,-l)+l 6= 



E ■ 



with g_ a = 0. Let us diagonalize the difference between subsequent Q r -s as 



Qr - Qr-1 = OjA r O r , , 



0,. 



,R + 1, 



(E5) 



(E6) 



where the orthogonal K x K matrix O r is made up by column eigenvectors of Q r — Q r i and A r is diagonal and has 
the real eigenvalues as diagonal elements. A derivation similar to that given in Section IV A and Appendix ^ yields 
the i?-RSB term 



tp[$(y),{<l?},x]\ n= , 



D K z ln / T) n Z\ 



B K z 2 . 



r ( R+1 I 

J \r=0 



(E7) 



Here A? has the square root of the eigenvalues (possibly also imaginary numbers, the sign being irrelevant) as 
diagonal elements, D K z denotes the A'-dimensional Gaussian integration measure, and z r is a AT-dimensional vector. 
The function ^(y 1 , . . . , y ) is naturally abbreviated by ^(y). The recursion 



(E8a) 
(E8b) 



VV-i(y) = J V K z ip r ( y + zA?0, 



evaluates ( |E7|) as 



<p[$(y),{q?},x 



Xl 



B K z \nip [zAgO 



(E9) 



In order to produce a PDE we need to specify a time-like variable. For practical purposes we consider the case 
when one diagonal element is a known constant, say = 1. Then we pick q} 1 as time variable, call its continuation 
q, and obtain the PDE for the field ip(q,y) in K spatial dimensions as 



lKl,l/) = e*<*>. 



(ElOa) 
(ElOb) 



Here the dot means derivative in terms of q, of course [Q] 11 = 1, and q evolves from 1 to 0. As in the case with one 
spatial dimension, in the g-intervals (qm,l) and (0,q(n)) we have x(q) = 1 and x(q) = 0, resp., where qm = qn and 
5(o) = Qo- Again, by introducing 



<p(q,v) 



Intpjq, y) 
x(q) 



we obtain the A'-dimensional PPDE as 



99(1, y) =#(y). 

Then the sought term is 

^(y),{ gr w },*]| n=0 = ^(0,0). 
The evolution in the interval (qm , 1) can be solved explicitly to give 



(Ell) 



(E12a) 
(E12b) 



(E13) 
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/• d K vd K w 

= J T^rg ex P + _ ») _ 3*"(QJH-1 - Qr)w] , (E14) 

that is the initial condition for further evolution in (0, qtx))- From the mathematical viewpoint, the problem of 
existence of the above expression needs to be clarified for the specific <P in play. It typically occurs that a diagonal 
element of Qh+i is known to vanish, but for other r-s the same diagon al is positive. In general, Q^+i — Qr is not 
necessarily a positive definite matrix. However, given the fact that Eq. (E14) at y = is the RS free energy (where 
3(1) is replaced by the RS value of q), on physical grounds we surmise that the divergence of the integral is a rare 
threat. 

In the present case there are \K (K + 1) OPF-s, namely, x(q) and q kl (q), (k, I) ^ (1, 1) and k,l < K. 
Expectation values 

((M&hivl}))) (E15) 



we conveniently define by inserting the function A in the integrand of (El) and omitting the 1/nln from in front 
of the formula. The n — ► limit is understood. As in one spatial dimension, the GF Q v (qi, Hi', qi-, 2/2) f° r the 
multidimensional PPDE is a key help in calculating averages of common occurrence. The GF is zero for q\ > qi and 
satisfies the PDE 

d qi g v = -±v yi Qv yi ^ - \x{ qi ) (v y Mquyi))Qv yi g v - %i - q 2 )s K ( yi - y 2 ). (Eie) 

Special significance is attached to 

P(q,y)=g v (0,0;q,y), (E17) 

a natural generalization of the K = 1 field. Let us introduce the derivative fields 

H k {q,y) = d yk ip(q 7 y), (E18a) 
K kl (q,y) = d yk d y iLp(q,y). (E18b) 

Then we can write the two-replica-correlator 



dcp[$(y), Q] 



lab 



((x k a x l b ))^C^ k \q r(a , b) ) (E19) 



as 



C^ k \q) = J d K yP(q,y) [» k (q, y)fx l (q, y) + 9(q - 1"V'(<?, y)] . (E20) 

By use of this formula the stationarity conditions for a free energy that contains a term like (|El]) can be immediately 
constructed. 



APPENDIX F: AN IDENTITY BETWEEN GREEN FUNCTIONS 



In this Appendix we prove the identity (4.87). The r. h. s. of 

d q T vvv (q; {q l: 2/i}f =1 ) = J dy ( [d q G<p(qi,yi;q, y)] G v {q, y; tfc, 2/2) G<p(q, y, 93, 2/3) 

+Q<p(qi,y\;q,y) [d q G v (q,y,q2,y2)} G v (q,y;q3,y3) 
+G,p(qi,yr,q, y) G v {q, y; (fe.j/a) [d q G v (q, y; <?3, 2/3)] , 



can be expressed by our making use of the PDE-s for the participating GF-s. From (4.77) we have 



(Fl) 



9qG v (qi,Vi;q, y) = ^dyG v (qi,yi;q, y) - x{q)d v [n(q, y) G<p{qi,yi)q, y)] + S(q - qi)d(y - yi), 



(F2) 
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and for i — 2,3 (4.76) holds as 



9gG v (q,y;qi,Vi) = -\dlQ tp {q,y;q l ,y i ) - x(q) n(q,y) d v Q lp {q,y;q l ,y l ) - 8{q-q t )6{y - yi). 



(F3) 



Let us substitute the r. h. sides of the above PDE-s into (Fl). The sum of the terms linear in x{q) turns out to be 
a derivative by y, so - under the plausible condition that the GF-s decay for large \y\ - integration by y gives zero. 
The second derivatives in y also cancel after partial integration but for a remnant that yields 

3 ? r w (?;{<&, 2/i}f=i) = J dyg ip (q l ,y l ;q,y)[d y g ifi (q,y;q2,y 2 )} [d y Q v {q,y\q3,yz)] 



+S(q - qi)Q<p(qt,vi]q2,y2)G<p(qi,vi;qa,V3) 

-S(q - q2)G<p(qi,yv, q2,y2)G v {q2, 2/25 93, 2/3) 

S(q - q3)G v (qi,yi;q3,y3)G ip {q3, 2/3; 92, 2/2) 



(F4) 



Eq. (4.75) relates derivatives of Q v and Q^, whence we obtain (4.87) for qi < q < q2 and qi < q < q$. 



APPENDIX G: PDE-S FOR HIGH TEMPERATURE 



Here we record the calculation leading to the lowest order nontrivial correction for the distribution of local stabilities 
at high temperatures. Assuming P{q,y) = Po(q,y) + (3 P\{q,y) + 0{(5 2 ) and expanding the SPDE we obtain 

d q P = ±d%P , Po(0,y) = <%), (Gla) 
d q P x = Idfa+xdyiPamo), Pi(0,y) = 0. (Gib) 



Here mo(q,y) is the lowest order approximation for the field m(q,y) in (7.5), thus it satisfies ( |7.6| ) with (3 = 0, 
i. e., evolves according to pure diffusion. Using its initial condition mo(l, y) = V'(y) we get 



m (q, y) = J d Vl G(y - j/i, 1 - q) V'{ Vl ). (G2) 
The zeroth order probability field is obviously 

Po(q,y)=G(y,q), (G3) 



while the next correction can be obtained from (Gib) using the Gaussian GF for pure diffusion. This gives (in case 
of ambiguity q~° should be understood in the upper limit of integration) 

p i(q,y)=J d 9i J ' dyiG(y-y 1 ,q-q 1 )x(q 1 )d yi G(y 1 ,q 1 )m (q 1 ,y 1 ) 

dqi J dyi (d yi G(y-y 1 ,q-qi))x(q 1 )G(yi,qi)m (qx,yi) 
= d vJ dq!x(qi) J dyidy 2 V'fa) G{y x , q x ) G(y - y u q - q-y) G(y 2 - yi, 1 - <?i). (G4) 



In the last Eq. the formula (G2) for mo was also substituted. We note the elementary identity 

3 



(G5) 



where 



A = yf(q 2 + 93) + 2/2(91 + 93) + 2/3(91 + 92) - 22/12/293 - 22/ 2 2/ 3 9i - 2y32/i92, (G6a) 
a = 9i92 + 9293 + 939i • (G6b) 



Hence, at q = 1, we obtain 



P 1 (l,y) = d v f'dqxiq) J d Vl V '( yi )--^== exp [~ y2 ^ yiq ) . (G7) 



This is identical to the function p\{y) given in Eq. ( |7.62b| ). 
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APPENDIX H: LONGITUDINAL STABILITY FOR HIGH TEMPERATURES 



Below we show that the linear operator displayed in (7.60) has all negative eigenvalues on the space of smooth 
functions ((q) with £(0) = £(1) = 0. Consider the eigenvalue problem, 



min(<ji,<j 2 ) 



dq 
D(qf 



AC(<Zi), 



(HI) 



where we omitted the factor —1 on the r. h. s. of ( 7.60| ), so the positivity of the A-s is to be proven. The 1. h. s. 

separates as 



Jqi JO 



D(q) 3 



D(qf 



The first term is equivalently 



qi dq 
D(q) 3 



9i 



which concatenates with the second term in (H2) to 



o m 3 



d<?2C(<22), 



d<? 2 C(<?2) 



Introducing 



e(?)= / d^cM, 



we obtain after differentiation of the eigenvalue problem (HI) the equivalent form 



D(q) S 



(H2) 



(H3) 



(H4) 



(H5) 



(H6) 



This equation may have a solution that vanish at the boundaries only if A > 0. Indeed, one can try to solve ( |H6| ) by 
the "shooting method" starting from £(0) = and attempting to reach £(1) = 0. Then the sign of the curvature of 
£(q) may not be the same as the sign of within the whole interval (0, 1), or else £(1) — w ill never be reached. 
Since D(q) > for q < 1, this implies A > 0. Thus we have demonstrated that the Hessian ( |7.6C| ) is negative definite. 
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